Identifying individual households - sample size discrepancy in 2000s?

rennane · September 2, 2014, 2:00pm

I have been trying to count all the households in my IPUMS CPS extract, but for some reason I haven’t been able to get the number to match the total number of households listed on the website here.

Following the survey documentation, I have been identifying households by serial * year. However, when I count the number of households by this variable combination (bysort serial year: egen tag = _n == 1), I get approximately 20,000 fewer households in the 2000s than what is listed on the website. I get the same (lower) number when I simply count pernum == 1.

As an example, in the 2013 extract I get a count of 74,821 households using both of the methods listed above, while the number listed at the link above for 2013 is 98,095. Any suggestions about what I am missing here?

Thanks!

grover · September 2, 2014, 3:00pm

The difference between the number of household you are seeing in your rectangular data extract and the total number of households are the “non-interview” or vacant households. Because the default, rectangular extract structure places all household information on the person record, households with no associated person records (empty households) are effectively dropped from the extract. In the 2013 March IPUMS-CPS file there are 23,274 empty households. Since these households are empty, there is not much information about them in the data. However, if you wish to download an extract with all households you can choose to create an hierarchical extract from the extract request menu.

I hope this helps.

Topic		Replies	Views
How to interpret Number of Records CPS	1	184	October 11, 2023
What is the best way to analyze household level data in the CPS ASEC? CPS	2	476	June 27, 2018
ASEC vs. Census CPS numbers CPS	1	269	September 2, 2021
Counting observations per year, people and households CPS	4	987	March 24, 2019
Discrepancy in Census CPS file and IPUMS file?	3	26	May 16, 2025

Identifying individual households - sample size discrepancy in 2000s?

Related topics