Topcoding of earnings in ATUS

Hey IPUMS users,

As documented, earning variables are topcoded in ATUS data using a single “cutoff” value throughout all years (for example, earnweek is topcoded at 2884.61 in all data years). The problem is that average and median salaries rose during the period covered by the data, 2003-2023 (see for example here). This means that later years in the data are more coarsely topcoded, with a higher percent of cases passing the threshold every passing year. Additionally, the underestimation of the earnings of the topcoded cases grows bigger as time progresses (i.e., the average of the real earnings of cases that were topcoded will be higher in later data years while data still gives us 2884.61). As my understanding goes, this problem is exacerbated when using CPI to account for inflation because cases from 2003 earning 2884.61 will now get an even higher value relative to cases from 2023 that are also topcoded at 2884.61.
Is there a way to mitigate this problem? Is it better to topcode values again after applying CPI?

I understand this is not an ATUS question per se, but any help will be very much appreciated.
All the best and thank you for your help.

From a quick tabulation, I’m finding that the number of respondents with the top code for EARNWEEK (2884.61) increased from 202 in the 2003 ATUS to 442 in 2023 (from 1.5% to 8.2% of the total in-universe sample). As you explain, it is expected that more respondents will be top coded over time due to inflation. This may bias average earnings estimates by decreasing the reported earnings down to the top code for a growing size of the sample. It will in most cases not affect median earnings estimates since the median will still be below the top code threshold. The Census Bureau engaged with this issue in April 2023 when they instituted a dynamic topcode to weekly earnings that would be set at 3% of the sample in each survey month. This however does not address the issue for earlier samples.

You might be able to mitigate this effect by applying a consistent top code across your sample years and dropping respondents above this top code. For example, since EARNWEEK top codes at most 8.2% of respondents (in 2023), you might drop the top 8.2% of weekly earners from each sample that you analyze. Alternatively, if you’re interested in comparing inflation adjusted earnings, then you might observe that the top code value in 2023 is roughly equivalent to $1,752 in 2003. In this case, you might exclude respondents in each year whose inflation adjusted weekly earnings exceed the top code for your most recent year of analysis. A good strategy may be to perform your analysis with and without respondents with top-coded income values and compare your results.

For methods other than dropping top coded respondents, I recommend reviewing this paper on using the pareto distribution approach for top coded values. You will also want to review how other researchers and publications handle this issue.

1 Like