Aggregating Observations by State + County Fips Codes


I’m attempting to identify which counties in the US are the most representative of the country as a whole before I continue on with my research using the 5% 2019 ACS. I realize that ACS microdata do not always go down to the county-level. If I’m reading the documentation correctly, it seems that PUMAs are the closest geographic equivalent: they sometimes consist of a fraction of a county, an entire county, or multiple counties depending on populations. I’m curious if there are distinct issues with using the State and County Fips code variables together to try to generate a county indicator. Are there any specific types of counties that missing from ACS micro data, or the omitted counties effectively random? In other words - should I be concerned in comparing the counties I can identify to the population averages generated from a particular mico data file?



A PUMA must have at least 100,000 residents, and only 18% of counties had that many people in 2010, so the counties we can identify from PUMAs are only larger counties, typically core counties in medium-to-large metro areas. From the set of identified counties, I think you could get a representative sample of counties in large metro areas, but you can’t get a representative sample of all counties, nor could you identify a set that effectively represents the whole range of rural/urban population in the country.

For the specific purpose of identifying a subset of counties that represent well the country as a whole, you could instead use ACS summary tables, available through IPUMS NHGIS, which provide data for all U.S. counties.

Thank you, Jonathan!

This is very helpful.