Unweighted decennial microdata seems to overestimate number of households (by a lot)

nmarantz · September 24, 2024, 1:26pm

I’m using IPUMS microdata from the 1960 census, and I’m clearly overestimating the number of households. I ran this R code:

data |> 
  # restrict to one record per household
  distinct(SAMPLE, SERIAL, .keep_all = TRUE) |> 
  # add up weights
  summarise(sum(HHWT))

The result is 115,434,597 – more than twice as many households as in the US in 1960. What am I doing wrong?

Ivan_Strahof · September 24, 2024, 10:45pm

HHWT is constructed independently for each IPUMS USA sample. For 1960, IPUMS offers two samples for analysis: a 5% density sample and a 1% density sample. This is the case for many other decennial census years. The variable SAMPLE = 196001 identifies records in the 1960 1% and SAMPLE = 196002 identifies those in the 1960 5%.

In a sample that includes only household records (or only a single person record per household), the sum of HHWT across all records should equal to the number of households in the US population in that year. That means that the sum of HHWT in the 1960 5% and the 1960 1% samples will each approximate the total number of households in 1960. More detail on each sample can be found on our samples page and in the User Guide’s section on sample designs.

Using the code you provide, I obtained an estimate of 57,699,837 households using the 1960 1% sample. The 1960 5% sample meanwhile provides an estimate of 57,734,760 households. The sum of these two match the result that you’re reporting. I hope this clarifies what’s happening with the data.

Topic		Replies	Views
How do I account for HHWT when looking at data about number of households in a state from sample w/ all states? USA	8	1234	December 18, 2020
Why do I fail to replicate published ACS aggregates? USA	2	219	March 2, 2023
How do I use HHWT to create data tables? USA	1	631	June 14, 2018
Sample Weights Approach Overtime for Decennial and CPS-ASEC	3	240	February 21, 2024
Obtaining Total Household counts for 1990 and 2000 USA	1	356	May 25, 2021

Unweighted decennial microdata seems to overestimate number of households (by a lot)

Related topics