I pulled data from IPUMS to help inform a project my team is working on. The questions my colleague wants to answer are about presence of children in households by race of householder by units in structure. This is not a pre-packaged Census/ACS table, so I am turning to the PUMS data.
However, I’m a little unsure of my analysis, as this is the first time I’ve really used the PUMS data to prepare a custom table. I don’t have anyone at my workplace to turn to for double-checking my work, so I hope this forum might help answer my question.
To summarize, I was able to extract the 26,000+ person records (12,000+ housing records) of the five primary PUMAs of the city I’m looking at (Portland, OR). Then I did some recoding (e.g., units in structure reduced to “multi-family, single-family, other”; and whether the household has children). Then I kept the householder record and put that in a pivot table. I then summed the HHWT (household weight) field. My results seem right, but I’m not sure if this was the correct way to do this.
One person I asked, who admitted they don’t really use HHWT field so they don’t really know, said I should not have kept only one record per household. But this doesn’t seem to agree with the advice on the HHWT documentation.
An additional question I have is whether the HHWT field should sum to the total number of households within the PUMAs selected. If so, it seems my HHWT field is under-counting the number of households/units. The sum of my HHWT field (one record per household) is 224,876 (using 2014 5-yr ACS), but using the data in FactFinder the total housing units comes to 224,876, and the occupied unit count is 251,512. Why is there a discrepency?
Below is the table I came up with. Would anyone be able to check my numbers or general approach? Thank you for reading.
You are correct about keeping only one person per household while using the HHWT variable. As the documentation suggests selecting one person (e.g., PERNUM = 1) to represent the entire household.
You are also correct that by using the HHWT variable, the sum should equal the number of households in each PUMA. I am not sure what is driving the discrepancy in the total number of households between our data and the published Census data. I did verify with the PUMS estimates for user verification and these estimates match the estimates provided by IPUMS while using the HHWT variable. It may be worth notifying the US Census Bureau, as the discrepancy seems to be with their published tables.
@Jeff Bloem: Thank you for your response. I’m now concerned with the HHWT field not adding up to what’s in FactFinder.
First, I filtered the data more carefully to remove group quarters so I could more closely match occupied housing units in FactFinder. The IPUMS figure ended up matching the FactFinder figure quite closely—only about 800 households shy of the published figures. Is this normal? What else could account for the discrepency?
Second, I loaded the ACS 2010-14 Census Bureau housing unit PUMS file into a Postgres database to help verify the data, but that too does not sum up properly either. The IPUMS results are closer to the published PUMS tables, but both don’t match FactFinder.
Could you comment on this table below and advise me on what to do? Initially, you suggested I contact the Census Bureau because their published tables might be wrong, and those are the basis for the IPUMS files). However, the IPUMS files seem close enough to the FactFinder published results; and now, rather, it’s the PUMS housing unit file that seems wildly off.