I’m having trouble matching IPUMS and PUMS results for household records, and am hoping you could help me with it.

There seems to be some inconsistency between IPUMS H type variables and PUMS data. For example, using ACS 2006-2008 3yr,

while adding up HHWT (selected PERNUM = 1 to represent each households) for CA, I got a total of 13039770 which is different from the result from PUMS (13295476, matches the number from PUMS estimation file provided by census).

Person level data would match.

I checked individual weights, got the following, IPUMS don’t have missing value for hh weights. Could you let me know if I missed anything here? Can I use the “Ipums Online Data Analysis System” to validate customized results generated using PUMS?




There are two differences between your summation of HHWT and the summation of HWGT from the Census Bureau PUMS file. First, as you noticed, the Census Bureau allows blank values for household-weights while IPUMS-USA does not. Blank weights are given to Group Quarters in the PUMS file while IPUMS-USA assigns the person weight of the Group Quarters individual to the HHWT field. If you limit your analysis to only households (in the Online Data Analysis system you would do this by typeing “gq(1,2,5)” in the Selection Filter(s): field) you will get a weighted total of 12,177,846 households in California. The remaining difference between the IPUMS-USA HHWT total and the PUMS WGTP total is the weighted total of vacant households in California. Because IPUMS-USA produces rectangular datasets by default (and only the Online Data Analysis Tool only uses rectangular datasets), vacant households are excluded. Researchers who need to include vacant households can request hierarchical data. I checked in both the PUMS and IPUMS-USA file and the weighted total of vacant households in California is 1,117,630.

I hope this helps.