I’m comparing estimates of total population by county (ie statefip & countyfips) in the decennial IPUMS from 1940 to 2000 with the exact values from the full counts by county, which I obtained from NHGIS. The two line up well in all years except in 1980, and I cannot figure out why.
My best guess is that for some counties, a subset of individuals in IPUMS have a countyfips value different from 0, while the rest has it set to zero.
For example, Allegheny County, PA (which includes Pittsburgh) with FIPS code 42003 has a population of 1,450,085 in 1980 according to NHGIS. However, in the 1980 5% IPUMS sample, this FIPS code only has 9,017 individuals (which represent only 9,017 x 20 = 180,340 of the county population).
I can get to a value that is closer to 1.4m by using the county group variable, but that then gets me further away from the actual full counts in other counties. Hence, I haven’t figured out a systematic way of dealing with this issue.
Any help would be much appreciated!