Different sample size across versions of 1940 census

My organization has a version of the IPUMS Complete Count 1940 census data stored in a repository. It claims to include all individuals from the 1940 US census. The total number of records in the file is 132,404,766. The citation information provided is:
Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 6.0 [dataset]. Minneapolis: University of Minnesota, 2015.

However, when I access the 1940 full count census data on IPUMS.org, I’m seeing that the total number of records is 131,903,910. The citation info currently on the ipums website is: **Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek. IPUMS USA: Version 9.0 [dataset]. Minneapolis, MN: IPUMS, 2019. https://doi.org/10.18128/D010.V9.0

So, it looks like there is a discrepancy of about 500,000 people. I also noticed in the citation info that the versions are different. I’m a bit confused because it seems like the total N for the 1940 census should be fixed. Is there any reason to expect differing counts across versions of the data?

This is a good question. Typically, revisions to the data are listed on the IPUMS USA Revisions Page. In this case, two notes seem to identify the source of the differing number of records. First on January 29, 2019 a new version of the 1940 file was released which excluded Alaska and Hawaii. Second, on February 8, 2016 an updated version of the 1940 full count file removed a number of duplicate person records from the file.