Should Census data at the county level aggregate up correctly?

Hello! This question is cheating because it’s not about IPUMS data, so please feel free to ignore it, but I thought that the people who really understand IPUMS-USA data are probably also the people who understand the panoply of available Census-based population estimates.

I am doing a research project where we are pulling some denominators from the Census PEP’s annual county estimates (by age, sex, race/ethnicity): https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2020-2021/cc-est2021-alldata.pdf

For most (but not all) of our purposes, we don’t need county-specific rates and we could pull national populations directly from CDC Wonder: Single-Race Population Estimates

It would somewhat ease our workflow to have a single underlying denominator file for all analyses. But we weren’t sure whether the county-level population estimates should aggregate up correctly to national-level (and region-level) rates, or whether the county-level estimates might have some privacy distortions or national-level estimates have some additional people not included in any county, etc.

If anyone knows, we would be very grateful!

I am not an expert on the PEP files as we don’t offer these via IPUMS, but I looked around the Census Bureau website, and their statement on methodology for the United States population estimates (2022) seems to answer one of your questions directly. Regrading the aggregation of county-level estimates to the national level, I believe this excerpt includes the information you are looking for:

"The estimates are produced using a “top-down” approach. Given that it is generally more reliable to estimate the change of a larger population, we begin by estimating the monthly population at the national level by age, sex, race, and Hispanic origin. We then produce estimates of the total annual populations of counties, which we sum to the state level. With the national characteristics, state total, and county total estimates created, we produce estimates of states and counties by age, race, sex, and Hispanic origin.

One of our key estimates principles is that all of the estimates we produce must be consistent across geography and demographic characteristics. For example, the sum of the county total populations must equal the total national population, and the sum of a particular race group within a state’s counties must equal the total of that particular race group in the state. Since our various estimates products and processes use slightly different input data and methodology, they often do not generate this consistency automatically. Consequently, we adjust the final estimates to be consistent. As a result, the demographic components of change do not account for all of the year-to-year change in the estimates series. The difference between the result of the balancing equation and the final estimate is referred to as the residual. The national population estimates by characteristics do not contain a residual. This is because they are made first and are not required to sum to any pre-defined total."

How privacy distortions affect the denominators of the estimates is a bit more complicated. The Vintage 2021 population estimates use a “blended base.” The total population count by county is derived from the 2020 Decennial Census, and so there is noise injected into the counts. Based on what we know at IPUMS about the Bureau’s Disclosure Avoidance System, total population counts were favored. This noise injection likely doesn’t have much impact on total population.

However, the demographic characteristic counts used in the Vintage 2021 population estimates (race, ethnicity, sex, and age) are based on the 2010 Decennial Census. These counts were extended forward in time to 2021. The Bureau made adjustments to the demographic characteristic counts so that they sum to the 2020-based total population count. More details are available in the Vintage 2021 methodological document. Using the demographic characteristics data from 2010 for 2021-based estimates could be problematic, especially for some subgroups in some counties, but it is likely reasonable for many subgroup-county combinations.

Thank you so much! This is incredibly informative and it’s not even an IPUMS project. Thank you for saving us yet again.