I want to get data on U.S. households (the joint distribution of household incomes and household member races/ages/genders). I thought IPUMS is a good fit, but I got a bit confused by the tutorial. It says IPUMS are micro data at individual level. Does it mean that even if some members in a household are included in IPUMS, it is still likely that the others are not? If this is the case, are there any recommendations on what data I could look into?
Thank you!
Thanks for your question. These data are based on a sample of households; all individuals who regularly live in the household are enumerated in a household. You can request data at the person-level or the household-level. Because many people do person-level analyses, the default data structure for IPUMS extracts is rectangularized on the person record, meaning household variables are appended to the person record (and will be the same for all people in the same household).
Thank you, Kari!
I have a follow-up question. I would like to draw a random sample of households in state PA, such that BOTH the households AND the individuals in the sample are representative of the PA population. In this case, should I weight the data by hhwt and then draw households randomly? For example, if one household has hhwt = 40 and the other has hhwt = 20, I should draw the first household twice as frequent as the second household, is that correct? If so, will the individuals in my sample be representative of the population?
Thank you, and happy holidays!
Is there a reason you need to draw a sub-sample of the microdata? If you use all the existing ACS households in PA, and use the weights in your calculations, you will have a representative sample of both households and individuals.
If you do need a sub-sample, the simplest way is to use the “customize sample size” feature when creating your extract. This will draw a smaller representative sample, and automatically adjust the weights for you.
If you want to do the sampling manually, I would recommend sampling full households. In that case, in order to get a flat sample (not requiring weights when calculating means), you would want to sample households with a probability proportional to their weight, as you suggested. That will also give a representative sample of individuals.
Thank you for the answer, Matthew. It is very helpful to learn about your “customize sample size” feature. I think it is exactly what I want.
I have a further question. If I create a representative sample of households (either through “customize sample size” or sampling by “hhwt”), will the household members included in these households be representative of all individuals? I am asking because I found that different people in the same household could have different individual weights (“perwt”) in the data.
Yes, if you create a representative sample of households, it will also be representative for individuals (as long as you are using the person weights).