I was trying to calculate the total population for each race group using one-year ACS data from 2001 to 2019. I used the weight perwt in calculating the population, but could not get a close number as what reported in Census website (for example, for year 2019, it is reported here: https://data.census.gov/cedsci/table?q=B02018&tid=ACSDT1Y2019.B02018). Then, I tried to figure out why this happened by calculating the total population for the whole country to see if the number can be matched. Even for this, my estimates seem to be quite different from the real number. I got the total population 230,076,328 using one-year ACS 2019 and the STATA code: bysort year: egen pop=total(perwt), which is far away from the Census number 328.3 million. The difference is obviously beyond the normal error. I re-downloaded the ACS 2019 data. Without making any changes to the data, I found the total number of observations is 2,257,409. Given that one observation represents about 100 people, I guess I should have had more than (or around) 3 million observations in ACS 2019. I do not know in which part I got it wrong. Thank you for your help.
I ran the same exact command on the 2019 ACS and got a value of 328,200,000. Please make sure that when you create your extract that you select “Rectangular (person)” as your data structure and that the “Select Cases” and “Customize Sample Sizes” fields are left unused. Please let me know if after confirming these extract specifications you still encounter problems. I didn’t find an IPUMS account associated with your forum email address, but if you could send an email from the address associated with the account to email@example.com I can take a look at the extract from our end.
Thank you so much for the detailed explanation. Indeed, I re-used a previous extract as the basis for defining the new extract, and did not notice that the previous extract has a age restriction (selected age groups between 16 and 70). I just removed the age restriction and found that number could be matched to the Census reports. Thank you so much for your help. Really appreciate it.
Hi Ivan, I am getting somewhat of a similar issue, but for the 5-Year PUMS sample. I do not have any cases selected and my sample sizes are the default selections. When I sum the HHWT weights they are an overestimate of the number of households in the US. For the downloaded PUMS 2015-2019 my HHWT sum is 145,519,985 while Census QuickFacts says 120,756,048 (https://www.census.gov/quickfacts/fact/table/US/HSD410219). Is this due to the PUMS sample covering a 5-year period? Does the multi-year estimation mean that the “population” is meant to be a larger value than what you’d find in a single year?
I am not able to replicate your estimate of 145,519,985. I used the online tabulator to quickly estimate the number of households in the US using the 2019 5-year ACS and, when restricting to only the first person in each household (PERNUM == 1) and those not in group quarters (GQ values of 1, 2, or 5), I get 120,756,015. This is pretty close to the Quick Facts estimate; you may not be able to exactly replicate official statistics using the public use microdata sample (PUMS) data.
Hi IPUMS team,
I am having difficulty in replicating the population estimates for MSAs using ACS one-year estimates. The code I am using is below and I double checked the selection (rectangular and no further constrains) is correct.
pop<-acs %>% group_by(year,pwmet13) %>% summarise(pop=sum(perwt))
I compare the estimates with census estimates but there is considerate difference.
The long format is the population estimates per MSA from census data. The other is what I calculated via the code below. I wonder if I did anything wrong and why there would be this difference. Many thanks!