Last year, I used an extract of data that had the occupation “occ2010” variable in it. The analysis I am conducting only keeps the observations of people who work in the construction sector. Today, I revised the old extract and included two new variables “EMPSTAT” and “LABFORCE.” However, when I condense my data set by keeping only the construction occupations, even when I use the exact same commands as before, the newly condensed data set with the revised data has more observations in it than the original. I am trying to replicate my previous research while excluding people who were not in the labor force. However, I cannot replicate it exactly due to seemingly different observations in the revised data set. Is there a way that I can match the two data sets identically?
Is it possible to have more observations that fit a certain criteria in a revised extract than in the original?
Occasionally, IPUMS updates variables to improve upon known errors. This could cause differences when data extracts are revised and resubmitted. However, this doesn’t seem to be what is necessarily happening in your case. I just looked at your extract 30 and the revised extract 41, and found no difference in the number of observations of those who work in the construction sector. Perhaps you can send the code you are using to limit your sample to only construction workers? Another alternative, if you want to perfectly replicate your original analysis, is to merge EMPSTAT and LABFORCE onto your original data set using YEAR, DATANUM, SERIAL, and PERNUM as identifiers.
I hope this helps.
Thank you for your response. After double checking the data, I come to the same conclusion as you. I must have done something to my data set at some point in my previous analysis and did not record what I had done.