Swap files and cell means 1976 to 2000, 2000 to 2010, and years later than 2010

I’ll address each question one at a time.

(1) As noted on the income component cell means replacement values page, Larrimore et al. (2008) uses the same technique as the Census Bureau implemented for replacement values starting in 1996. These replacement values, available from 1996 through 2010, are available on this page. Additionally, at the bottom of this page we provide income component rank proximity swap values for 1976 through 2010.

(2) Yes, for samples from 1996 through 2010 the replacement values noted in the tables on this page are what is found in the data available on IPUMS CPS. Additionally, the cell means values from Larrimore et al. and top-coded values already in Census public use and IPUMS data will be nearly identical from 1996 onward, except for 2000, where the Census has acknowledged some data error, as noted in Larrimore et al. (2008). For additional context: The 1996 through 2010 public use files included the replacement values for observations above the top-code. Larrimore et al. extended this method backward for samples from 1976 through 2010. In 2011, the Census Bureau shifted from the average replacement value system to a rank proximity swapping procedure.

(Regarding any confusion on the term “top-codes”) The term “top-code” generally refers to a top-coded threshold. This means that any income value above the top-code threshold will be “top-coded” and (for samples between 1996 through 2010) replaced with the average value of the top-coded values. There are times when the observation is either not in the universe for the particular income question or when the respondent did not actually respond to the question, that there are special codes the note NIU or item non-response. In any event, I’ll look through this page and see if anything can be more clearly stated.

(3) Thanks for the note about the broken link, I’ll look into fixing this. If you were to download CPS data from the Census Bureau website, what you’d find would depend on the year of the sample. From 1962 through 1995, values exceeding the top-code are simply recoded with the threshold value. For example, all responses for INCWAGE greater than or equal to 50,000 in the 1976 CPS ASEC Survey were replaced with 50,000 in the public use Census data. From 1996 through 2010, the Census Bureau introduced replacement values to take the place of top-coded values. Topcoded individuals are divided into twelve groups depending on characteristics such as race, gender, and full time status. Income values are reassigned according to the mean income within each group. From 2011 onward, all income values above the top-code are rounded to two significant digits and then swapped among individuals within a bounded interval. This last method is called “rank proximity” and is what is applied consistently in the 1976 through 2010 files on the bottom of this page.

(4) As noted in this question and answer, The code 99997 does not identify “top-coded” values, rather it is the code for item non-response. These codes will still persist despite swapping values because the swap-values method does not correct for observations where the respondent did not respond to the income question.