Maximum Possible Value v. Replacement / Swap Values

Let me preface this question with an acknowledgement that there has been a lot of conversation on this forum about how to decode the differences between top codes, replacements, and swap values. I have combed through the postings about this and I am still having a difficult time getting my head around why maximum possible values exist in the CPS data from years 1996 - 2018. AHoerner’s question here clarifies that so-called “top code” values with a terminal ‘7’ value are essentially denoting item non-response. But if the swap and mean replacement procedures are in place for the CPS data from 1996 - 2018, why does one also find maximum possible values that exceed the thresholds for mean replacement and swapping? I am using the language of the revamped top code table here. So, this means that ‘maximum possible values’ generally have a terminal ‘9’ value and frequently consist of a string of ‘9’ values.

For example, I am using data from the 2016 CPS ASEC. I find after cleaning out the N.I.U. values that data for the variable INCSURV1 still includes values equal to 99,999. From the top code table, I see that this is the ‘maximum possible value’. However, since this data is from 2016, I am puzzled as I would think this value would not exist; instead, I would expect it to be swapped using the topcoding procedure for that year.

I have two outstanding questions, then: first, what do maximum possible values mean in this context where they are values that exceed the swap threshold but still remain in the data? Why aren’t they swapped, essentially? Second, what ought one do with them? Of course this is up to the researcher, but if they stand for actual incomes rather than ‘item non-response’ then wouldn’t one want to correct for them in some fashion since they are often outlier values?

I think a quick clarification will help clear up the confusion here. In the case when the swap value equals the maximum possible value (as is the case for INCSURV1 and INCSURV2 in 2016 and 2018), no values are swapped and anyone above the swap value threshold simply receives the maximum possible value. So, to answer your first question, when the maximum possible values equal the swap threshold value, the values in the data essentially act as traditional top-code values. Regarding your second question, you can deal with these values in any number of ways. I’d suggest running your analysis with them excluded and with them included to see how much they change your estimates.

1 Like

Thanks for breaking this down, @JeffBloem.