Reproducing SPMFTOTVAL

As a part of my research, I need to decompose the income measures that comprise an SPM unit’s cash income (SPMFTOTVAL) into its component parts. I understand that SPMFTOTVAL is simply the sum of INCTOT for each member of an SPM unit, and therefore, SPMFTOTVAL is also the sum of INCTOT’s component incomes, which are listed for various years here. In theory, I should be able to reproduce SPMFTOTVAL by summing the income components that make up INCTOT in a given year for each SPM unit (identifiable using SPMFAMUNIT).

In practice, I am able to do this for most years. However, I am having trouble reproducing SPMFTOTVAL using the 2018 and 2019 CPS ASEC. I made sure to remove NIU values from all the component income variables as suggested previously in a similar inquiry here.

In Stata, my code for aggregating these incomes in 2019 at the SPM unit level is as follows:

egen row_income = rowtotal(incwage incbus incfarm incss incwelfr incretir incssi ///
incint incunemp incwkcom incvet incsurv incdisab incdivid incrent ///
inceduc incchild incasist incother incrann incpens)

bysort year spmfamunit: egen spm_cash_cps = total(row_income)
// Reconstructed SPMFTOTVAL = spm_cash_cps

In 2018, my code differs slightly from what I’ve written above because I don’t include INCRANN or INCPENS into the estimates.

Comparing means for SPMFTOTVAL and my reconstructed variable SPM_CASH_CPS above for years 2018 and 2019, I find the means of each variable differ by several hundred thousand.

Taking just a couple examples from the 2019 data:

  • The family unit with SPMFAMUNIT = 25834001 possesses an SPMFTOTVAL = 133212, but my calculations using the code above find the sum should be 266424. This is of course odd because my estimate is exactly double the SPMFTOTVAL.

  • The family unit with SPMFAMUNIT = 25824001 possesses an SPMFTOTVAL = 158576, but my calculations using the code above find the sum should be 148576.

For other data years, 2010 - 2017, I don’t have any problems recreating SPMFTOTVAL using the procedure detailed above. Curious if I may be missing something for these data years, or if I might be going about things the wrong way.

Thanks for your inquiry. We are looking into this, but we likely won’t have a response until sometime next week.

I was not able to replicate your results; when I replicated SPMFTOTVAL manually by summing component variables (and replacing NIU codes with 0) I got a nearly perfect match (but for a handful of cases that differ by $1 and I assume are rounding errors). However, in the process of trying to replicate SPMFTOTVAL, I did note that the NIU codes for the component variables were cumbersome to track down. My best guess is that your code reassigning NIUs had a minor typo (e.g., a 9999999 instead of a 999999). I have verified that NIU codes are listed correctly on each variable’s webpage and they are consistent within variable across these years, but that the NIU codes differ across the INC* variables.

Since I already tracked them down, here are the current NIU codes (in Stata syntax) that need to be handled for these variables (for variables available in 2018-2019):

recode incchild inceduc incss incssi incunemp incwelfr incwkcom (999999=0)
recode incasist incdisab incint incrent incsurv incvet incdivid incother (9999999=0)
recode incwage inclongj incbus incfarm incretir (99999999=0)

Thanks for working to reproduce this Dan! I’ve double-checked my script that removes NIU values and it matches what you’ve written above, so I’m not sure that’s what’s at issue.

I see you’ve included INCLONGJ in your code snippet. My understanding is that this variable is not included in INCTOT, right? If I’m mistaken in my understanding, this could be what the problem is.

Sorry - one more question. I see that for INCPENS, the NIU code is 0; however, I also find that the data has maximum values of 999,999. Given that the maximum value is 999,999, shouldn’t this be swapped for 160,000 (or, 80,000 - the swap value for INCPEN1 and INCPEN2 combined)?

Here is my Stata code below. This extract only includes 2015-2022 ASEC samples. I recoded INCLONGJ but actually did not use it in generating the new variable. I matched your syntax to make sure we are seeing the same results. Note: the 30 contradictions are all differences of $1, which is likely due to a rounding issue.

As for the INCPENS issue you noted, the maximum values for INCPEN1 and INCPEN2 are 999,999. The Swap Threshold values do not effect top-codes/maximum values. This shouldn’t be pertinent to this issue of matching to SPMFTOTVAL.

. recode incchild inceduc incss incssi incunemp incwelfr incwkcom (999999=0) 
(272,656 changes made to incchild)
(272,656 changes made to inceduc)
(272,656 changes made to incss)
(272,656 changes made to incssi)
(272,656 changes made to incunemp)
(272,656 changes made to incwelfr)
(272,656 changes made to incwkcom)

. recode incasist incdisab incint incrent incsurv incvet incdivid incother (9999999=0)
(272,656 changes made to incasist)
(272,656 changes made to incdisab)
(272,656 changes made to incint)
(272,656 changes made to incrent)
(272,656 changes made to incsurv)
(272,656 changes made to incvet)
(272,656 changes made to incdivid)
(167,760 changes made to incother)

. recode incwage inclongj incbus incfarm incretir (99999999=0)
(272,656 changes made to incwage)
(272,656 changes made to inclongj)
(272,656 changes made to incbus)
(272,656 changes made to incfarm)
(545,551 changes made to incretir)

. egen row_income = rowtotal(incwage incbus incfarm incss incwelfr incretir incssi incint incunemp incwkcom incvet incsurv incdisab incdivid incrent inced
> uc incchild incasist incother incrann incpens)

. bysort year spmfamunit: egen spm_cash_cps = total(row_income)

. assert spm_cash_cps==spmftotval
30 contradictions in 1,246,885 observations
assertion is false