I’m trying to replicate the total number of tax returns receiving the EITC across income distributions found in Table 2.5 here: SOI Tax Stats - Individual Statistical Tables by Size of Adjusted Gross Income | Internal Revenue Service.
Filtering to returns receiving a positive EITCRED and then summing with ASECWT has left me about 6 million total returns short of the IRS data (for 2018, for example). Is this within the margin of error for ASECWT? Should I be using a different weight when looking at tax data?
The reason I’m doing this with IPUMS, even though the IRS already provides these numbers, is to compare real income of recipients across years.
Thank you so much for any help you can provide.
I was able to replicate the discrepancy you described between the IRS’ published numbers of tax returns receiving the Earned Income Tax Credit and the number of tax units with a positive (in universe and non-zero) value of the variable EITCRED in the 2018 CPS ASEC. The discrepancy between the IRS’ published statistics and the estimates in the CPS data exists because the tax variables in the ASEC are imputed by the Census Bureau using methods described in detail in this paper from 2004 and this paper from 2022. The discrepancy you see in 2018 is expected and in line with the size of the discrepancy present in data from other years (e.g. 13.3% vs 16.8% of total tax returns in 2002 receiving the EITC, see table 9 in the 2004 paper). The 2004 paper further notes that evaluating simulated EITC estimates is difficult because the IRS published data are unaudited, and many claims for the EITC are denied or adjusted downward. I checked on data from a few other years and am seeing about a 2 percentage point difference between the IRS’ published numbers and the numbers derived from CPS data.
Note that in order to generate estimates of the number of tax returns receiving the EITC, you will need to adjust your observations using FILESTAT so as to not double-count households that are filing jointly. A simple way to do this is to drop the spouse (RELATE = 202 OR 203) of the household head in cases where the household head’s value for FILESTAT = 1, 2, OR 3.
Thank you! This makes sense and is super helpful