IPUMS USA makes available the 1940 Census 100% sample, where each row in the table represents a person. Just to be specific, when I aggregate the person-level records for the state of Georgia, I get 3,128,132 persons. However, various published summaries of the 1940 Census put the population total is 3,123,723 – see for example the Vital Statistics publication for 1940 on page 23 (warning: 28MB file) (also can be obtained through IPUMS NHGIS). A pretty small discrepancy – something I am not worried about but I am curious nonetheless.
At the county level, I get some discrepancies when breaking down populations by race (White vs. All Other). For Appling County, GA, the aggregated total is 14,511 vs the published value of 14,497. Among Whites, the aggregated count is 12,114 vs a published value of 11,856, and for non-Whites, the aggregated count is 2,397 vs a published value of 2,641 (unless I made a mistake).
I was just trying to understand why aggregates based on the microdata are slightly different from the published aggregates in various publications. Of course, 1940 was a long time ago and maybe we don’t know the exact filters or database that the Census used, which may be the answer to my question. Apologies if I have missed something obvious or an explanation in the documentation, and thank you for making the data widely available.