My organization has a version of the IPUMS Complete Count 1940 census data stored in a repository. It claims to include all individuals from the 1940 US census. The total number of records in the file is 132,404,766. The citation information provided is:
Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of Minnesota, 2015.
However, when I access the 1940 full count census data on IPUMS.org, I’m seeing that the total number of records is 131,903,910. The citation info currently on the ipums website is: **Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek. IPUMS USA: Version 9.0 [dataset]. Minneapolis, MN: IPUMS, 2019. https://doi.org/10.18128/D010.V9.0
So, it looks like there is a discrepancy of about 500,000 people. I also noticed in the citation info that the versions are different. I’m a bit confused because it seems like the total N for the 1940 census should be fixed. Is there any reason to expect differing counts across versions of the data?