Yet another question about swap values


There are two addendums to the IPUMS data relating to top codes:

  1. The Census Bureau’s 2012 release, which claims to update the CPS with the rank-proximity method from 1976 onwards. I will call this “swap 1” from now on.

  2. The Larrimore et al. release, which claims to update the CPS with the cell-means method, from 1976 onwards. I will call this “swap 2” from now on.

I have downloaded both of these files and “swapped” the values according to the posted instructions. Then I examined the maximum income in each year for all three datasets. Here are the results for the “incwage” variable:

Here, the red line represents the maximum value of “incwage” in the original CPS data, while the swap1 and swap2 data are green and blue respectively.

Note that from 1976 to 1985, the swap1/swap2 values still appear to be topcoded, albeit at a slightly higher level than the original CPS. Then, in 1994, there is a tremendous spike in the max wage, suggesting that there was a change in top coding methodology.

This is inconsistent with the Census’s claim that the swap values from 1976 onwards consistently employ the same methodology (rank-proximity in the case of swap1).

I was wondering if you know anything more about this issue.


It seems the observation in your graph can be explained by a few notes in Section 5 of Larrimore et al. (2008)—see attached. In particular, one of the limitations of these approaches is that since cell-means and/or swap values are based on the internal CPS data, which itself is subject to some censoring at the very top of the income distribution, the series does not reveal the full scope of the US income distribution.

As noted on page 111, “Specifically, one should pay careful attention to changes in the internal censoring points which introduce trend-breaks in our cell mean series. Since 1976, there were 3 years where internal censoring points were adjusted substantially and thus trends at these years should be considered with caution. The first is 1986, when the internal censoring points for labor income sources increased from $99,999 to $250,000. The second is 1988 when the income sources were redefined and expanded from 11 to 24, redefining censoring points in the process. The third is 1994 when the internal censoring points were increased from $299,999 for primary labor earnings and $99,999 for secondary labor earnings to $999,999 for all labor earnings sources. A fourth increase occurred in 1995, when internal censoring points for primary and secondary wage earnings were increased to $1,099,999. However in terms of consistency, very few individuals fall into this range, so this increase is much less troublesome than the other three adjustments.”

Larrimore et al. (2008).pdf (140.6 KB)

Thanks. That makes sense. I’ve done some further analysis and computed the percentage of observations which are at the maximum wage, and also the percent of all income at the maximum wage. These results are in the graphs below. It does appear that the swapped values reveal more of the income spectrum.

I have a follow up question, though. Note that in the late 1990s and early 2000s, there some kind yearly variation in the data. This is esp. prominent in the second image above, which almost has a saw-tooth pattern. Do you know what might account for this?

After digging around a bit in various US Census resources, I am not sure what might account for this observation. It might be simply due to the raw sampling variation in the CPS data. That is, in some years the sample more closely represents the entire income distribution than in other years. I do note, however, that despite the variation the share of each sample that hits the maximum income value remains quite low (i.e., always less than 2 percent). Additionally, aggregate statistics of INCWAGE over time do not seem to be meaningfully driven by this detail.