Population estimates by state and age for 2002 ACS data in STATA

I’m trying to calculate population totals for each age for each state and the US for 2002 using ACS data in STATA.

This is what I did (below). However, the US total population count (all ages combined) I’m getting from this (about 280.7 million) is much less than published population estimates for the US population in 2002 (about 287.6 million). What am I doing wrong? Should I be weighting this using perwt somehow? Is there a better way to get population estimates?

  • Sort by age, and add up individual person weights to get population count for each age
    bysort age: egen tot_perwt=total(perwt)

  • Creates data set with only variables you want in it
    collapse (sum) perwt, by(age year statefip)

I will note that doing the same thing with 2017 ACS data gives an estimate that matches published estimates of the US population for 2017, but it’s not happening for 2002.

The estimate you are getting (~280.7 million) does not seem incorrect. Here are few explanations for why:

(1) Generally, you can expect there to be differences when comparing figures between sources. In particular, if the source you are referring to uses the non-public use Census Bureau data files (such as data found in American FactFinder), there will inevitably be differences in estimates. Did you find the estimate of 287.6 million people on the Census Bureau website? If not, another cause for differences would be that you are comparing estimates that rely on different data sources.

(2) The 2002 ACS sample in IPUMS USA does not include those residing in group quarters. In 2006, the first ACS sample in IPUMS USA that does include group quarters (GQ), there were roughly 8 million individuals residing in
some type of group quarters. As such, it seems likely that the estimate of 287.6 million includes those in group quarters.

Aside from the notes above, while there is nothing incorrect about how you’ve done your analysis in Stata, it may be worth noting that Stata has a weighting option. For example, if you are interested in the breakdown of the population by state, you can run the following: tab state [fweight=perwt].