How has the 1% samples of the 1920 and 1930 U.S. samples changed since January 2010?


I requested seperate data extracts from the 1920 1% and 1930 1% U.S. Census samples in January 2010. I used the code provided by IPUMS and then added my own analysis.

I recently resubmitted these data extracts exactly as they were and ran the same analysis on them and found the data had changed. At the simplest level, in my original data abstracts I pulled only cases of individuals 30 years old or less. IN the new data, where I simply resubmitted the same data extract request, the number of observations has changed from the 2010 versions.

Overall what changes have been made to the 1920 and 1930 samples individually since 2010 and what was the reason for these changes?




As noted on the Errata and Revisions page, in April of 2013, new 1930 samples were released. The note details the main focus of these changes. The “minor differences in allocated values” is referring to the fact that in the 1930 samples (and in the 1920 samples) some variables are allocated by IPUMS-USA if the original value is missing. IPUMS-USA provides detailed descriptions of the editing and allocation procedures used for the 1920 and 1930 samples (there is also a more general discussion on editing and allocation). AGE is one of the variables that is allocated in the 1920 and 1930 samples. If the age was allocated using using donors (hot-deck allocation, identifiable using the variable QAGE), new donor tables are generated each time the data are processed, this means that some allocated values change each time the data are processed. Since your extracts selected cases based on age it is expected that your extract could change by as much as 0.3% (the total percentage of persons whose age is currently allocated) in 1920 and 0.1% in 1930.

Also, as noted under the Corrections heading of the latest Revision note, there has been a correction made to the RELATE variable. This change improves the quality of the data by more accurately identifying boarders and lodgers. The correction did not change a very large number of observations, but for the observations that were changed, many other variables also changed, since RELATE influences a number of family and household composition variables. This could especially affect attached characteristics, since they are based on the family interrelationship variables.

If there are other specific variables you are interested in, the Errata and Revisions page lists all other major revisions as well, which you can use to trace the changes that may have affected the 1930 and 1920 samples back to 2010.

I hope this helps.