About population estimates using one-year ACS data

wenjianx · December 6, 2021, 2:26am

I was trying to calculate the total population for each race group using one-year ACS data from 2001 to 2019. I used the weight perwt in calculating the population, but could not get a close number as what reported in Census website (for example, for year 2019, it is reported here: Explore Census Data). Then, I tried to figure out why this happened by calculating the total population for the whole country to see if the number can be matched. Even for this, my estimates seem to be quite different from the real number. I got the total population 230,076,328 using one-year ACS 2019 and the STATA code: bysort year: egen pop=total(perwt), which is far away from the Census number 328.3 million. The difference is obviously beyond the normal error. I re-downloaded the ACS 2019 data. Without making any changes to the data, I found the total number of observations is 2,257,409. Given that one observation represents about 100 people, I guess I should have had more than (or around) 3 million observations in ACS 2019. I do not know in which part I got it wrong. Thank you for your help.

Ivan_Strahof · December 10, 2021, 8:09pm

I ran the same exact command on the 2019 ACS and got a value of 328,200,000. Please make sure that when you create your extract that you select “Rectangular (person)” as your data structure and that the “Select Cases” and “Customize Sample Sizes” fields are left unused. Please let me know if after confirming these extract specifications you still encounter problems. I didn’t find an IPUMS account associated with your forum email address, but if you could send an email from the address associated with the account to ipums@umn.edu I can take a look at the extract from our end.

wenjianx · December 12, 2021, 11:30am

Dear Ivan,

Thank you so much for the detailed explanation. Indeed, I re-used a previous extract as the basis for defining the new extract, and did not notice that the previous extract has a age restriction (selected age groups between 16 and 70). I just removed the age restriction and found that number could be matched to the Census reports. Thank you so much for your help. Really appreciate it.

Best,
Wenjian

Zachary_Marhanka · February 25, 2022, 1:34am

Hi Ivan, I am getting somewhat of a similar issue, but for the 5-Year PUMS sample. I do not have any cases selected and my sample sizes are the default selections. When I sum the HHWT weights they are an overestimate of the number of households in the US. For the downloaded PUMS 2015-2019 my HHWT sum is 145,519,985 while Census QuickFacts says 120,756,048 (https://www.census.gov/quickfacts/fact/table/US/HSD410219). Is this due to the PUMS sample covering a 5-year period? Does the multi-year estimation mean that the “population” is meant to be a larger value than what you’d find in a single year?

KariWilliams · March 1, 2022, 7:53pm

I am not able to replicate your estimate of 145,519,985. I used the online tabulator to quickly estimate the number of households in the US using the 2019 5-year ACS and, when restricting to only the first person in each household (PERNUM == 1) and those not in group quarters (GQ values of 1, 2, or 5), I get 120,756,015. This is pretty close to the Quick Facts estimate; you may not be able to exactly replicate official statistics using the public use microdata sample (PUMS) data.

Dongmiao_Zhang · April 23, 2022, 6:16pm

Hi IPUMS team,

I am having difficulty in replicating the population estimates for MSAs using ACS one-year estimates. The code I am using is below and I double checked the selection (rectangular and no further constrains) is correct.
pop<-acs %>% group_by(year,pwmet13) %>% summarise(pop=sum(perwt))
I compare the estimates with census estimates but there is considerate difference.

The long format is the population estimates per MSA from census data. The other is what I calculated via the code below. I wonder if I did anything wrong and why there would be this difference. Many thanks!

Best,
Dongmiao

JonathanSchroeder · April 25, 2022, 2:47pm

Your code uses the variable PWMET13 to identify metro areas. Its universe is “Persons age 16+ who worked last week.” And it identifies the metro area where respondents worked, not where they lived.

If you’d like to estimate the resident population of metro areas, use MET2013.

Topic		Replies	Views
Population estimates by state and age for 2002 ACS data in STATA USA	1	706	March 14, 2019
Error in Calculating Total Population for 1990 and 2000 Samples USA	1	18	August 13, 2024
2014 ACS and CPS-ASEC population estimates does not match after weighting CPS	4	483	December 12, 2016
Discrepancy in Counts of Children by Age (2019 ACS) USA	1	298	June 22, 2021
Comparing across 1980, 1990, 2000 Census and 2006-2008 3 year ACS USA	1	372	April 14, 2016

About population estimates using one-year ACS data

Related topics