Hello IPUMS Team,
I’m finding additional records for counties and states when I look at households across four PUMAs that correspond to Bucks County.
I used the Census list of PUMA Names as a crosswalk and retrieved ACS PUMS 2015-2019 data for PUMAs 03001, 03002, 03003 and 03004 which combined should make up Bucks County (county fips 42017), screenshot here:
This is the code I used to subset, and it appears to keep just those four PUMAs, but doing so includes records from other states:
*create a subset just for Bucks County;
if puma = 03001 or puma = 03002 or puma = 03003 or puma = 03004 ;
Also, in this screenshot from a SAS proc freq, you’ll see that for COUNTYFIP 17, the weighted count is 238,828 households, or only 26% of what’s in the four PUMAs, which total 917K households. That count is more consistent with what I found looking at Bucks County with ACS Table B25003, screenshot also below.
I’m confused about why the four PUMAs have an estimate of 917K households and include other counties and states.
Using a map, it doesn’t appear that the four PUMAs should include other FIPS, below are two screenshots and a link to a map I quickly put together:
So I have two specific questions:
What could be causing the additional records from other counties and states to be included in the four PUMAs that make up Bucks County?
How do I only retrieve records only for Bucks County? I thought subsetting just for the four PUMAs would work. It’s seems simple enough to filter by COUNTYFIP = 17, but I guess that depends if you find there’s a more serious issue with the data or help me identify what I’ve overlooked.
Thank you for your time!
The variable PUMA is state-dependent; the codes must be read in combination with one of the state variables (STATEFIP or STATEICP), which I believe is the source of the issue you have encountered. Note that you should use PERNUM==1 or RELATE==1 in order to restrict your analysis to households. I summed up the number of households in Bucks County, Pennsylvania according to county and PUMA and found that they match:
Number of households by county: 238,828
COUNTYFIP==17 & STATEFIP==42 & RELATE==1)
Number of households by PUMA: 238,828
PUMA %in% c(3001, 3002, 3003, 3004) & STATEFIP==42 & RELATE==1
When I sum up the number of households according to PUMA without including STATEFIP, the number of households matches what you have found: 917,464
PUMA %in% c(3001, 3002, 3003, 3004) & RELATE==1
I hope this helps!
Thank you, Grace, I appreciate the quick response. I didn’t realize that PUMA codes were state-dependent.
And what is the difference between PERNUM and RELATE? Do you recommend one over the other in analyzing household level variables?
Restricting your analysis using RELATE==1 excludes group quarters (GQ==3,4) whereas restricting based on PERNUM==1 will provide an estimate of the number of households and group quarters. I used RELATE in my code to keep it simple, with the intent of excluding group quarters to match the universe of the ACS table you pointed to, “Occupied Housing Units”. To get the same universe using PERNUM, you will also need to exclude group quarters using GQ==1,2,5 when subsetting your data. My apologies for leaving out this important distinction in my original recommendation. We generally advise users to use PERNUM, instead of RELATE, to restrict analyses to the household level.
Thank you for explaining the difference, that was very helpful. I appreciate all your assistance!