Hello! I’m using a subset of tax variables from the 2024 ASEC, and I believe I may have discovered an error in the unharmonized variable uh_taxid_a1. It appears that this variable in IPUMS consists only of the first three digits of the 10-digit tax_id variable in the raw ASEC PUMS from Census (though per IPUMS documentation it should be the full ten digits). I’m attaching a screenshot here for comparison.
Best,
Harris Eppsteiner
Associate Director for Economic Analysis, The Budget Lab at Yale
harris.eppsteiner@yale.edu
Thank you for bringing this to our attention. I have replicated the issue that you’re seeing in the data and am investigating further. I will follow up next week as soon as I have more information to share.
Thanks for your patience and bringing this to our attention. It appears that different file types of the original Census Bureau 2024 CPS ASEC PUMS data have different values for this variable. Specifically, the ASCII file downloaded via FTP seems to truncate the variable and only display the first three digits, though it does not appear in either the CSV file via FTP download or data files requested via the API. This may be related to the variable STTAXREB being removed between the 2023 and 2024 ASEC, though I cannot say without confirmation from the Census Bureau. Since IPUMS CPS ingests the ASCII file for our harmonized data, it is not surprising that this issue is present in our data.
I have shared this with the IPUMS CPS team who will review and see if it is possible to provide data for this variable from other sources. In the meantime, you can merge observations with untruncated TAX_ID values with your IPUMS CPS extract using the linking keys HRHHID, HRHHID2, and LINENO (referred to as PULINENO in the CB data).