Hello,
I’m using the ACS 2017–2021 5-year PUMS via IPUMS to estimate the number of male and female householders aged 60+ living alone in Massachusetts. My approach:
- Restricted to ‘civilian, non-institutionalized’ individuals (‘GQ’ == 1 or 2)
- Kept householders only (‘RELATE’ == 1)
- Calculated household size using ‘bysort serial: gen hhsize = _N’, and flagged ‘living_alone = (hhsize == 1)’
- Defined male/female 60+ living alone by ‘age >= 60’, ‘sex’, and ‘statefip == 25’
- I also tried defining living alone using ‘hhtype == 4’ (male living alone) and ‘hhtype == 6’ (female living alone)
- I tested the following weights: ‘hhwt’, ‘perwt’, ‘expwth’, and ‘expwtp’. All yield the same estimates.
- Final estimates using ‘svyset [pweight=expwth]’ with ‘svy: total’:
- ~138,492 older males (60+) living alone
- ~268,911 older females (60+) living alone
** Similar totals are obtained across all weight variables**.
However, raw data from the AGID portal (Home | AGID) reports much higher counts:
- 166,570** for older men (60+) living alone
- 352,625** for older women (60+) living alone
This is a difference of 17–24%, and it persists regardless of weighting or definition (‘hhsize == 1’ vs. ‘hhtype’).
** My questions are:
- Is this level of discrepancy expected between IPUMS-PUMS estimates and ACS published or AGID-processed summary tables?
- Which weighting variable is recommended for estimating household-level indicators like living alone status?
- Are there any known issues with using ‘RELATE == 1’ and ‘hhsize == 1’, or using ‘hhtype’, in PUMS to approximate living alone?
Any clarification would be greatly appreciated!
Thanks for your targeted question and detailed description about how you arrived at your estimates. I suspect the source of the discrepancy you are seeing is that the AGID table (specifically table MAs21004) includes cells for ages 80-84 and 85+ as well as a category for 80+; my best guess is that you missed these duplicative categories and are double counting persons in these age brackets.
Because I am not familiar with the AGID tabulations, I first confirmed that I was able to match your estimates using the 2017-2021 PUMS files via IPUMS USA. Next I compared these numbers against the Census Bureau’s published estimates; I specifically looked at table B11010, which reports the count of non-family households by sex of householder by living alone by age of householder. The estimates I generated from the PUMS were within the margin of error for the estimates published in these detailed tables.
Next I looked at table MAs21004 for the 2017-2021 tabulations of Massachusetts from AGID. This table reports 82,575 female householders aged 80 and older who are living alone (in addition to breaking this into age groups of 80-84 and 85+)-- this is about the size of the difference between the estimate derived from the PUMS and what you report you found in the table, hence my guess that the source of the discrepancy is double-counting persons in this age group.
To answer your specific questions:
- I would generally expect a PUMS-derived estimate to be within the margin of error of published estimates, though there may be exceptions and I would not expect to exactly replicate the published estimates.
- For analyses reporting person counts, you should use PERWT (or EXPWTP). For analyses reporting household counts, you should use HHWT (or EXPWTH). In your situation, this is not a meaningful way to differentiate as single-person households are both estimates of the number of people and the number of households. The household weight is equal to the person weight of the first person in the household, which is why applying different weights does not affect your estimate.
- I am not aware of any issues using the variables you listed to estimate living alone, particularly because you restricted your sample to households (i.e., not group quarters). Note that the IPUMS variable NUMPREC is an extant version of the HHSIZE variable that you created.
I hope this helps. Follow up with further questions.
1 Like
Thank you very much for your detailed explanations. This completely clarified my confusion. I really appreciate your help!