I’ve been trying to work with some of the ACS data for counties in Illinois. I am specifically interested in PUMA 1600, although I’ve seen similar issues and am concerned about these other PUMAs in the state. Based on the ACS 5 yr. data for 2008-2012 that I downloaded, there are approximately 29,000 people in the 5 counties that make up PUMA 1602 (Code 17 01602). Based on what these coutnties individual populations sumed together, there should be approximately 148,000. It was my understanding that PUMAs had to be made up of 100,000 people or more to keep the data from being identifying so I am a bit confused as to what these PUMAS of <100,000 really represent. Any help would be greatly appreciated in understanding:
Why this might be the case?
What might be a way to work around it?
Or if I need to use a different data set what recommendations there might be for an alternative set?
We need to be able to break them down into PUMAs for the demographic information we are trying to retreive. Because of this, the 5-year was our best bet. Any help would be greatly appreciated. Thank you.
I believe the source of your problem is the fact that the 5-year 2012 ACS uses two sets of PUMA codes and boundaries (as explained in the Not Regarding Multi-year Samples at the bottom of the PUMA variable description). The PUMA code 01602 in Illinois only exists for 2012 sample respondents, which is only one-fifth of the full 5-year sample. Multiplying 29,000 by 5 will get you closer to the 148,000 figure you were expecting. To consistently identify geographic areas in the 2012 multi-year samples the new and old PUMA boundaries either have to be identical or you would need to combine PUMAs within the New and Old sets so that their boundaries are identical. In some areas, especially densely populated cities, this is relatively easy and not much geographic specificity is sacrificed. In other, less densely populated areas, a lot of geographic specificity is lost.
An alternative would be to use the 2011 5-year ACS file. The 2011 multi-year files use the same PUMA codes and boundaries throughout the sample, thus avoiding any issues of matching boundaries across years.