Last year I made a presentation to the Portland R User’s Group on the Area Deprivation Index, a method for classifying every neighborhood (census block group, to be precise) in the U.S. into a percentile based on socioeconomic position using census data. Most of the ensuing discussion was about how some of the data seemed strange – neighborhoods in Portland and elsewhere occupying the lowest percentile, seemingly arbitrarily. Most of the reason for this, I believe, is because these neighborhoods often contained missing census data. If you treat missing values as zero values, you get what my audience saw. If instead you impute the values, you get a quite different picture.
In Manhattan the differences are among the most severe in the US. This is because there are plenty of block groups with few if any homeowners, meaning that values for things like median home value or median mortgage are often missing.
Here’s a side-by-side comparison. The Yost Index uses 7 of the 17 inputs that the ADI uses, so is not identical, but should be highly similar. I chose it because it is more parsimonious as well as more transparent. Note how the ADI consistently implies a lower socioeconomic position in one of the most affluent parts of the entire country.
A link to my presentation is here
Note: I used the leaflet package to make these maps. One thing I’ve found is that when I generate a map, my CPU usage jumps to 100% and I have to quit RStudio before I can do anything else. Others have occasionally reported this problem with RStudio generally, but not with leaflet specifically. It makes it difficult to explore the various features of leaflet!