Question regarding data discrepancy in the same extracts


I was facing a question regarding what can lead to huge data discrepancy when you resubmit a previous extract. I extracted data number 24 on October 1, 2021. I needed the data again so I recently resubmitted it again . However, the statistics that I get from the data are different from what I obtained when I initially extracted the data 24. So I went ahead and extracted 24 again without the resubmit option, but the revision option (but not actually revising anything). The data extract number I get is extract 30 in my account. This time around my estimates were exactly same as what I got before. In sum , the statistics that I obtain from extract 24 is different from what I obtain from extract 30 (which is just a revision of 24, without actually revising anything) . For instance, if I do the summary of the variable inctot, the lower bound for extract 24 is 0 while for extract 30, it goes in the negative. I wanted to ask what can lead to this discrepancy, or is there something specific to my ipums account. Thank you.

This is definitely unexpected behavior. We are looking into the issue and will provide an update as soon as we have more information. My apologies for the inconvenience.

Iā€™m posting this on behalf of Kari since she is out of the office this week.

Thanks for your patience while we looked into this issue. This issue resulted from a rare interaction between how our extract engine is configured and the StatTransfer process for creating a Stata-formatted data file. The issue should not affect other data users going forward; however, you will need to use your revised data extract instead of the resubmitted data extract to get the correct values. To confirm, your extract 30 (the revision), which includes a broader range of values for INCTOT is the correct version of the data.

1 Like

Thank you for the confirmation!

1 Like