I am working on a household-level analysis that requires me to break down households into those who have infants (age=0), toddlers (age=1 or 2), preschoolers (age=3 or 4), and school-aged children (age=5 up to and including 12) and the number of children in each age group in each household. I’m not interested in the relationship between the children and primary respondent (i.e. I don’t need to match children to their parents).
After the first pass of the analysis I noted that the counts of children by age seemed a little low, so I started poking around in the data to see if there was something wrong with my code.
In stata:
gen infant=(age==0)
table region [fw=perwt], stat(total infant) (and I get ~3.5 million total using 2019 ACS)
bysort year sample serial: egen infants=total(infant)
table region if pernum==1 [fw=hhwt], stat(total infants) (and I get ~3.2 million total using 2019 ACS)
*Similar code/issue for other age groups
These numbers should match, but they don’t. When I did an unweighted sample though, I get exactly the same estimates, which leads me to think that there is some issue with the weights. Has anyone done a similar type of analysis and had more success with getting estimates that look reasonable or does anyone have any thoughts about what might be the issue and how to address it?
Any help is much appreciated!