Nonexistent variables:
Most of the income variables contain between 30K and 80K actual NAs, or things that register as actual NAs with rlang::are_na. I take these to be the values in years when the income variable in question does not exist.
Yes?
Top codes or item nonresponse?
All of the swap variables except those corresponding to incss, incwelfr, incssi, incdsa2, and inclongj contain one or more values of “99997”. The shorter version “9997” does not occur in these variables.
Here: https://cps.ipums.org/cps-action/revisions
it says:
Edited supplemental files.
Changes were made to a Census Bureau income data replacement values file called “swapvalues”, provided in a form that is compatible with IPUMS data extracts on the Income Component Cell Means Replacement Values page. Some income replacement values were coded as 99999, which is typically an NIU code. These values have been replaced with 99997 to indicate that the income value was topcoded due to a limited number of digits, and the codes should be treated as an income value.
However, here:
https://forum.ipums.org/t/five-beliefs-and-two-questions-about-top-coded-values-in-the-ipums-cps/2768/2
Jeff Bloem states that these codes represent item non-response
If these variables are indeed top-coded by this the 7-terminal value may I assume that the value at which they are top-coded is 1 * 10^n, where n is the number of digits in the variable?
Is there anywhere in the ddi that that the number of characters in a field is consistently given? It is only sometimes in the coding instruction. May I safely assume that the number of digits that a variable has in the ddi, when given, is the same as the number that the same variable has in the swapcode file?
Questions relating to field widths & 9-strings
Maybe coding instructions in the original variables will help clarify these issues?. The coding instructions for the original values (not the swap files, unless they are the same) gives strings of nines from five to eight long as not-in-universe, and strings of the same length with a terminal 8 as missing (for only a few variables) and in a terminal 7 as top-coded for numerous variables.
I find these variations in length confusing. At first I thought that they were all set at the width of the fields, and varied for that reason, but this is not the case. See, e.g. FTOTVAL, top-coded at 50,000 (five digits), but with an NIU value of 999999 (six digits). Within the swap files, are these strings of nines with varying terminal digits of the same length as the field widths?
These seven variables include some observations coded with “999999” (six nines) in the swap file:
incwage incbus incfarm oincwage oincbus oincfarm oinclongj
All of these six variables have five-digit terminal 7 values, so it appears that my guess above, thaty the lengths are set by the field lengths, is incorrect. In addition, these two variables have length-4 9-strings with a terminal 9: incss and incssi.
How should I interpret these 9-terminal nine-strings in the swap file variables? If they are missing values, do we know anything about how they are missing (and why they have not been imputed in the swap file)?