IPUMS USA Person Weights Discrepancies Over Time


I am conducting an analysis where I follow entire cohorts over time, looking at them once when they are 0-9 years old and again when they are 30-39 years old, 30 years later. I am using the decennial census, and only looking at people born in the US (bpl is one of the 50 states, District of Columbia, or “United States, n.s.”). I do this for cohorts that were 0-9 years old in 1940, 1950, 1960, 1970, and 1980.

I am using person weights (perwt) in my analysis, and as a check I compared the sum of the person weights of the child sample to those of the adult sample. In most cases the sum of adult weights is 3-4% lower–roughly one million fewer people–than the sum of child weights, which seems plausible if some people die before age 40.

However, when I look at the number of native-born children age 0-9 in 1940 I get a total population of 20,706,807, but in the 1970 census I calculate a population of 20,808,300 people age 30-39, about 100,000 more. Neither of these numbers includes people born in Alaska or Hawaii, which I dropped from both years because they were not part of the 1940 Census. For comparison, in the 1950 cohort there were 450,000 fewer people in the adult sample than the child sample. The declines in the other cohorts are larger still.

The 1940 census has more individual records in IPUMs than the 1970 one for this population, with 220,992 observations of US-born kids 0-9 in 1940 versus 208,083 observations US-born adults 30-39 in 1970. But about 1/4 of the 1940 records have person weights under 100 while all the 1970 records are weighted 100 exactly.

I would very much appreciate any thoughts on where this discrepancy might be originating and whether it is likely to be problematic for my analysis. Is there any legitimate reason why the US-born population in a given cohort might increase over time? Is it reasonable to follow cohorts over time in this way?



IPUMS data matches the public use files provided by the Census Bureau. The weighted population in these files are the original census estimates; however, the Census Bureau’s Population Estimates Program (PEP) annually revises their official population estimates between decennial censuses. Since these revisions are not reflected in our static files, our absolute population figures contain noise that can obscure the real change from sample to sample.

Specifically in regards to your example, there was an unusually large upward revision by PEP to the 1940 under-one-year-old population (AGE=0). The discrepancy between the 1940 0-9 year old cohort and the corresponding 1970 30-39 year old population appears to be due to this original underestimate of AGE=0 persons in the 1940 Census, which was accounted for by the time of the AGE=30 population estimate in the 1970 Census. I recommend comparing our population estimates by age to the revised PEP estimates by age to help explain any other unexpected population changes you might encounter. If the population revisions are not in the anticipated direction, please email us at ipums@umn.edu to discuss further. You also might consider contacting the Census Bureau directly for more information or guidance about dealing with revisions to the population estimates.

In general, the IPUMS samples should still be representative of the population when weights are properly applied. This means relative characteristics of the population can still be compared across time, e.g. % female, average income, % unemployed, etc. Since your sample is defined as native-born, you will be capturing snapshots in time of an approximately representative sample of each original “birth cohort” population, with the caveat that the cohort population can still lose persons to migration out of the country or death. It is up to the individual researcher to determine how, and if, to account for the impact of out-migration and death on how representative the sample is of the original birth cohort population. You also need to consider that large population revisions to a single age, e.g. AGE=0 in 1940, can reduce the degree to which the weighted sample is representative of the cohort in that census year. You might consider reweighting your data to account for this imbalance.

Hope this helps.