I am conducting an analysis where I follow entire cohorts over time, looking at them once when they are 0-9 years old and again when they are 30-39 years old, 30 years later. I am using the decennial census, and only looking at people born in the US (bpl is one of the 50 states, District of Columbia, or “United States, n.s.”). I do this for cohorts that were 0-9 years old in 1940, 1950, 1960, 1970, and 1980.
I am using person weights (perwt) in my analysis, and as a check I compared the sum of the person weights of the child sample to those of the adult sample. In most cases the sum of adult weights is 3-4% lower–roughly one million fewer people–than the sum of child weights, which seems plausible if some people die before age 40.
However, when I look at the number of native-born children age 0-9 in 1940 I get a total population of 20,706,807, but in the 1970 census I calculate a population of 20,808,300 people age 30-39, about 100,000 more. Neither of these numbers includes people born in Alaska or Hawaii, which I dropped from both years because they were not part of the 1940 Census. For comparison, in the 1950 cohort there were 450,000 fewer people in the adult sample than the child sample. The declines in the other cohorts are larger still.
The 1940 census has more individual records in IPUMs than the 1970 one for this population, with 220,992 observations of US-born kids 0-9 in 1940 versus 208,083 observations US-born adults 30-39 in 1970. But about 1/4 of the 1940 records have person weights under 100 while all the 1970 records are weighted 100 exactly.
I would very much appreciate any thoughts on where this discrepancy might be originating and whether it is likely to be problematic for my analysis. Is there any legitimate reason why the US-born population in a given cohort might increase over time? Is it reasonable to follow cohorts over time in this way?