when INCTOT is not the sum of the various INC components

INCTOT in IPUMS-CPS is supposed to be “the sum of several different types of income that the survey asked respondents to report.” (https://cps.ipums.org/cps-action/vari…)

However in a(n admittedly small) share of the observations that is not true. For example, for survey year 2018 (ASEC), these are the differences between the supposed sum of the components and INCTOT (note that, despite the documentation linked to above, the component INCALIM is not available after 2014):

questionImage-1-23104848-full.jpg

Do you have any insight into the source of these “errors”, especially for the larger amounts. I note that 99999 is often a CPS code used for missing data.

Thank you!

David

David & I are both working with the 2018 ASEC.

I set all NIU & missing values to 0 and calculate

my_inctot =

incwage + incbus + incfarm + incss + incwelfr + incretir + incssi + incint + incunemp + incwkcom + incvet + incsurv + incdisab + incdivid + incrent + inceduc + incchild + incasist + incother

and compare it to inctot

my_inctot is short of inctot by 99999 for 587 cases out of 189k or so people in the poverty universe (OFFPOVUNIV = 1).

This suggests that I have left out a source of income Census has but is not on the IPUMS file.

But the consistent 999999 value worries me …

Nabeel

Greetings,

I’ve been able to replicate INCTOT and, without seeing you exact code, it seems to me that you may not be setting an income source to missing correctly. Using the 2018 ASEC, the sum of the income components are exactly the same to INCTO except for 582 cases with diff=-2 and 3 cases with diff=-1.

I suspect that you have a “replace inc = . if inc==99999” code with the wrong number of 9s. Try checking into those first. If that doesn’t work, let us know.

Best,

Jose

In follow-up to Jose’s response, using the 2018 ASEC sample, if you use “drop if inctot==99999999” then you should only end up with 582 cases where the difference is 2 and 3 cases where the difference is 1.

All cases where the difference is 2 it is a result of IPUMS topcoding using 99997. This user note provides an explanation of this process. In short, in instances where the topcode value is 99999 in a 5-digit numeric variable, IPUMS recodes the value to 99997 to avoid confusion with our NIU codes. Therefore, in these cases, INCTOT has been provided by the Census using a value of 99999 for one of the components of INCTOT. As a result, your own calculation of INCTOT using the component income variables will be off by a value of 2.

In the cases where the difference is a value of 1, it is hard for us to say what has actually happened since this data is provided directly from the Census. Because it affects such a small number of observations, I would recommend just dropping those cases. If you want to dig deeper, you could try contacting the Census Bureau for more information.

Thank you! You are correct—I was being careless with my recoding the missing values.