
Over my career, the most common type of map I’ve had to make is a simple choropleth map of the United States. There are a number of packages that can do this in R, but I haven’t found any of them quite satisfactory. That’s mainly because the map I want to make is a cartoon map, not a geographically accurate one. I want to resize and move Alaska and Hawaii below Arizona, and move Puerto Rico much closer to Florida so that my map fits neatly on the page. I want to enlarge Washington DC – and Delaware and Maryland and Rhode Island and Connecticut for that matter – so that I can see their values without having to squint or use a call-out box. I don’t often have data for the territories (Virgin Islands, Guam, Samoa), but I want to be able to include these, too, when the situation merits it. I want the boundaries to be highly generalized, much more so than any data layer out there. I don’t need to show Martha’s Vineyard or Catalina or any other small islands. I don’t need to capture the twists and turns of the Mississippi River as it forms part of the borders of ten states, or the intricacies of the Chesapeake Bay or Puget Sound shorelines, as these just add visual clutter.
I suppose this is less of a problem with R packages and more of a limitation with what geographic files are available. To show something like the latest state unemployment rates researchers will use a layer so detailed it’s suitable for zooming in on individual houses.
In short, I need something like the image at the top of the post, photographed from page 179 of Mark Monmonier’s 1993 book Mapping it Out. He used the word “caricature” rather than “cartoon”. Since I was unaware of a digital copy of this map, the first step was to redraw it myself. I traced it using QGIS and saved it as a shapefile, making a few minor modifications – I fit Washington, DC into its proper location (albeit greatly enlarged), added Puerto Rico, and added a bit of detail to a few states that I thought were excessively generalized, including Alaska and Hawaii. I like how it came out. It looks not like something produced on a low-end Windows laptop but something made by a human hand, carved out of a linoleum block, perhaps.
It took me a while to figure out how to attach data to this map, trying and rejected about a half-dozen mapmaking packages. In the end, ggplot2
did most of the work.
The base map layer, should you want to use it, is here. I named it for its original creator. Zipped, it’s a meager 9 kilobytes, thousands of times smaller than the typical files used to make maps. I know most Internet connections these days are fast enough that such things don’t matter, but this is still a massive difference.
Next, I’ll go over the code, also available as a GitHub Gist here. The first section uses tidycensus
to get the data necessary to calculate the poverty rate, then gets the state abbreviations, hard-coding DC and PR.
library(tidycensus) library(tidyverse) library(rgdal) census_api_key("your census api key here", overwrite=T, install="TRUE")
uspop <- get_acs(geography = "state", variables = c(pop = "B06012_001"), year=2017) uspov <- get_acs(geography = "state", variables = c(pov="B06012_002"), year=2017) uspovrate <- uspov$estimate/uspop$estimate us <- cbind(uspop, uspovrate) us$stabbr <- state.abb[match(us$NAME,state.name)] us$stabbr[9] <- "DC" us$stabbr[52] <- "PR" us <- select(us, -variable, -estimate, -moe)
Next we categorize the data into quartiles. The following line uses the readOGR()
function from the rgdal
package to read in the shape file. I was mucking around with some non-shape file formats that I thought might be friendlier to work with, but in the end I stuck with a shape file.
us$quartile <- with(us, cut(uspovrate, breaks=quantile(uspovrate, probs=seq(0,1, by=0.25), na.rm=TRUE), include.lowest=TRUE))
shape <- readOGR(dsn = "C:/fpb/old/qgis", layer = "monmonier")
The next four lines do the following:
-
Joins the shape file to the poverty data using the state abbreviation field. The resulting file structure is a SpatialPolygonsDataFrame. In principle, at this point this could be mapped directly using certain packages, but I was not able to get any of them to work.
-
Converts the id field to a character, which is needed two lines later.
-
Converts the SpatialPolygonsDataFrame to a conventional data frame. The file structure of this data frame is clever. Each point comprising my map (there are 468) is assigned variables telling to which polygon it belongs, and whether it has multiple parts or holes in it. Perhaps this would be inefficient if you were mapping a layer that had highly detailed polygons comprising hundreds of thousands or millions of points – but that is exactly what I am trying to avoid here.
-
The poverty rate data is joined back to the new data frame on the common ID field.
map1 <- merge(shape,us, by="stabbr")
map1$id <- as.character(0:57)
map2 <- fortify(map1)
map2b <- left_join(map2,map1@data, by="id")
All that’s left is a call to ggplot
. I fill the map based on the quartile variable, using the “greens” palette in the scale_fill_brewer()
function. Puerto Rico was not included in the census tables I drew from, so its value is set to gray. The theme()
function removes axes and grids and tick marks (because ggplot
thinks this is a graph, not a map, it includes these items by default). I finally add a medium-gray background to give the image some contrast.
ggplot(map2b, aes(map_id = id)) +
geom_map(aes(fill = quartile), map = map2b) +
expand_limits(x = map2b$long, y = map2b$lat) +
scale_fill_brewer(palette="Greens", na.value="gray65") +
guides(fill = guide_legend(title = "Poverty Rate, 2013-2017")) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background = element_rect(fill="gray50"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
The finished result!