I am wondering if I am right in thinking that I need to keep at least one member of each household in my dataset if I want my analysis to be representative on the household level using the HHWT in the ACS. The documentation on HHWT is not clear; it just says “Users should also be sure to select one person (e.g., PERNUM = 1) to represent the entire household” – but it’s not clear what is meant by “select.”
When I drop some of the households from the sample, I get very different point estimates and standard errors in my regression. Would you explain that?
By “select” the HHWT documentation is simply suggesting that the sample should be restricted to include only records in which PERNUM==1. This will ensure that your sample only includes one observation per household. There are a number of ways to do this within your preferred statistical software. You can either “keep” records where PERNUM==1 or “drop” records where PERNUM=!1.
A note on your regression results. I’d suspect that if you are running a regression with household level variables without restricting your sample to one record per household you are over-representing large households within your regression model.
I have a follow-up question on this issue. We restrict our analysis to couple households, which we tried by using two different methods in STATA:
using the svy, subpop() option, where we specify couple households as the subpopulation but keep all households in the sample
dropping all non-couple households from the sample and then use the svy command without the subpop() option
In both cases we only keep one observation per household (PERNUM==1), but the regression results are very different from each other. Would you explain these differences and which option would be the correct one?