How to interpret the correlation between INCTOT and INCEARN (using Brazil 1980 data)

Hi all,

For now, I know INCTOT means total income while INCEARN means total earning. So I think these two variables should be highly dependent.

But after I filtered the data which INCTOT is not equal to 99999998 (Unknown/missing)

or 99999999 (NIU (not in universe)), I find that in the remaining data ( 4,335,331 rows), there are large amount of data (4,288,128 rows) with INCEARN equal to 99999998 (Unknown/missing) or 99999999 (NIU (not in universe))

This results that if I filter the original data so that INCTOT and INCEARN are not equal to (Unknown/missing) or (NIU (not in universe)), I will only get 1% remained compared with original data.

This really confused me and I wonder whether the census will ask people only one of these two questions (INCTOT or INCEARN) and the other one will belong to Unknown/missing?

I really appreciated that anyone can help with me.

Thanks.

Hello Victor,

I think my post “Error on Brazilian income variables in 1980” (especially its fith point) may answer your question.

I hope this helps.Best,Geoffrey

Hello Geoffrey,

Thanks for answering. I looked through your post and feel very excited that you make such great investigation on Brazil data.

However, I am focusing on the (Unknown/missing) and (NIU (not in universe)) data in both INCTOT and INCEARN and your fifth points is about zero income.

But still appreciate your answer and I also feel werid that the number of INCTOT is one digit less than the number of INCEARN

Thanks Victor.

Ok, I think I see the issue with unknown/missing: Inctot and incearn are the sum of different income sources. As soon as a single income source is unknwon, inctot and incearn are also coded as unknown. Because inctot comprises more income sources than incearn, it has more unknowns.

Regarding the digits, it’s not a big issue: it just causes inctot to be top-coded but this concerns a negligible amount of observations in the Brazil data.

Best,

Geoffrey

There seems to a mistake in the programming of the INCEARN variable within the Brazil 1980 sample. As IPUMS documentation states, INCTOT is derived from four source variables (the sum of GROSSINC + INCOTHER + INCPENS + INCRENT) and INCEARN is derived from two source variables (the sum of GROSSINC + INCOTHER). It seems, however, that INCEARN only returns a value (other than NA) when both GROSSINC and INCOTHER have non-zero values. (i.e. if INCOTHER = 0 then INCEARN = NA, even if GROSSINC does not = 0.) This is an error and is currently being resolved. In the meantime, you can construct these variables on your own by including the source variables in your data extract and reconstructing the INCEARN variable in the way the IPUMS documentation suggests.
We like to reward our users for uncovering errors with a IPUMS mug. If you could please email ipums@umn.edu with your mailing address we will send you a mug.

Hi Jeff,

Thanks for answering.

So for INCEARN, only when its two source variables are not NA, it is not NA.

What about INCTOT? Since I find most of non NA value in INCTOT still have NA value in INCEARN. Does this mean you will count INCTOT as NA only when all of its four source variables are NA?

Hi Geoffrey,

Thanks for answering.

I am going to dive deeper into the data and investigate the source variables!

Best,

Victor

To Victor:

No problem. Just a few comments that may help.

First, the issue is with incearn, not with inctot (I had recomputed the income variables myself and checked for inconsistencies a long time ago: inctot is ok).

Second, your problem has puzzled me, so I remade a little check to the data. It seems there are other problems with incearn than those earlier mentionned: for instance, incearn is coded NA more than 90% of the time when br1980a_incpens=0.

I advise you to recompute it completely yourself.

To Victor and the IPUMS staff:

For a quick fix of the data, I have no problem sharing by email the STATA code where I am pretty confindent incearn has correctly been recoded in Brazil 1980.

Best,

Geoffrey

Hi Victor and Geoffrey,

Thanks for your work with these variables. Victor, your interpretation of the issue sounds correct. Geoffrey, the IPUMS Team is currently looking into fixing the programming error, however, this adjustment will not show up until the next data release. If you’d like to attach your STATA code on this thread for others to use, that may be helpful.

Cheers,
Jeff

*********** Income variables : BRAZIL IPUMS ****************************************************************** ******************************************************************

** inctot
*9999998 = Unknown/missing.
*9999999 = NIU (not in universe).
*Brazil 1991: 9,999,997+ (top code)

** incearn
*99999998 = Unknown/missing.
*99999999 = NIU (not in universe).

***** 1) Recoding inctot and incearn

*** a) 1980

** i) Inctot

gen double inctot_bis=inctot // destinÈ ‡ êetre droppÈ (contrairment ‡ inctot_IPUMS qui sert ‡ garder la version originale)
* 9999998 = Unknown/missing.
* 9999999 = NIU (not in universe): people aged less than 10
* rq: top-code for Brazil 1991: 9,999,997+
*INCTOT reports the person’s total personal income from all sources in the
* previous month or year.
replace inctot_bis= . if inctot_bis==9999998
replace inctot_bis=0 if inctot_bis==9999999

/** br1980a_njob: job situation last week
1Just one job1,973,765
2Multiple jobs (primary occ. plus others)57,654
3Only working in secondary job26,266
4Seeking employment27,527
5Retired2,447
6Didnt work and wasn"t looking11,873
9NIU (not in universe)3,770,935 (Persons age 10+ who worked)
*/
*ATTENTION: il y a des questions ‡ la fois sur le ft d’avoir travaillÈ last week et
*dans l’annÈe dans le recensement fde 1980!!!
*tab classwk br1980a_njob

/** br1980a_work: Worked in the last 12 months
1Yes2,081,044
2No2,250,217
3Drought front (frente da seca)19,078
9NIU (not in universe)1,520,128 (Persons age 10+)
**/
*tab br1980a_work br1980a_njob // logique
*tab br1980a_work classwk // logique
* classwk comptabilise les gens comme ne travaillany suivant la derniËre annÈe
* la diff sur l’univers vient de unknown et drought front qui est pas clair

** br1980a_grossinc: Gross earnings in principal occupation
*0 = Zero earnings.
*9999998 = Unknown.
*9999999 = NIU. (Persons ages 10+ who worked)
/*Q∞ 37: Monthly gross income, received in money in the occupation declared
in question 30 (Occupation, profession, task, function, etc, that was exercised for the longest time
(If changed occupations definitively note current occupation))*/
/*tab br1980a_njob if br1980a_grossinc==0
tab br1980a_njob if br1980a_grossinc==9999999
tab br1980a_njob if br1980a_grossinc!=0 &br1980a_grossinc!=9999998 &br1980a_grossinc!= 9999999
tab classwk if br1980a_grossinc==0
tab classwk if br1980a_grossinc==9999999
tab classwk if br1980a_grossinc!=0 &br1980a_grossinc!=9999998 &br1980a_grossinc!= 9999999
tab br1980a_work if br1980a_grossinc==0
tab br1980a_work if br1980a_grossinc==9999999
tab br1980a_work if br1980a_grossinc!=0 &br1980a_grossinc!=9999998 &br1980a_grossinc!= 9999999
*il ya qyelques inconsistent replies mais ds l’ensemble Áa va
*/
rename br1980a_grossinc inc_mainjob
replace inc_mainjob=. if inc_mainjob==9999998
replace inc_mainjob=0 if inc_mainjob==9999999

** br1980a_earnprod:Earning in products/merchandise in principal occupation
*0 = Zero earnings.
*9999998 = Unknown.
*9999999 = NIU. (Persons age 10+ who worked)
*This variable identifies the person’s earnings from products/merchandise
*sold in their principal occupation. ==> mauvaise traduction c’est le paiement
*en nature en rÈalitÈ pas pris en compte dans la Q∞ 37
/* Q∞ 38:
So havera registro de valor quando a pessoa receber, pelo trabalho exercido, pagamento em produtos o mercadorias.
No caso de receber parte em dinheiro e parte em produtos ou mercadorias, a parte em dinheiro sera registrada no quesito 37 e o valor da parte em produtos ou mercadorias, neste quesito.
O registro sera do valor medio meansal, real ou estimado, dos produtos ou mercadorias comercializadas nos ultimos 12 meses (valor do mercado), que recebeu pela ocupacao declarada no quesito 30.
Nao computar o valor da producao para consumo proprio.
–> trad: So there will be value record when the person receive at work exercised , payment in goods the goods.
If you receive part in cash and part in products or goods , the cash portion will be recorded in the category 37 and the value of the portion in products or goods , in this regard .
The record will be the average value meansal actual or estimated , of the products or goods traded in the last 12 months ( market value ) , which received the occupation declared in the item 30 .
Not compute the value of production for own consumption.
rq: IPUMS is wrong when it says that its solely from principal occupation
*/
rename br1980a_earnprod inc_jobinkind
replace inc_jobinkind =. if inc_jobinkind==9999998
replace inc_jobinkind =0 if inc_jobinkind==9999999

** br1980a_incother: Monetary gross income in other occupations
*0 = NIU. (Persons age 10+ who worked a secondary occupation during the week prior to the census)
*9999999 = Unknown.
/*Q∞ 39: Average gross monthly income from other occupations regularly exercised,
not including that declared in Questions 37 and 38.*/
** erreur d’IPUMS: ca concerne les pers qui ont rÈguliÈrement eu une seconde occup, pas seulement celles qui
*ont exercÈ une occup secondaire la semaine avant le census
/*tab br1980a_njob if br1980a_incother==0
tab br1980a_njob if br1980a_incother==9999999
tab br1980a_njob if br1980a_incother!=0 &br1980a_incother!= 9999999
*il ya qyelques inconsistent replies mais ds l’ensemble Áa va
*/
rename br1980a_incother inc_otherjob
replace inc_otherjob=. if inc_otherjob==9999999

**br1980a_incpens: Income received from pension
*0 = NIU. ( Persons age 10+ with pension income)
*9999999 = Unknown.
/* Q∞46: Monthly gross income received from retirement (FUNRURAL, reform,
retirement, etc) from Pens„o de Instituto, Caixa de AssistÈncia Social o Fundo
de Pens„o, from Abono, PermanÈncia [varias pension funds], and, divided by 12
the 14th minimum salary received from PIS or PASEP.
** rq: Do not include income deriving from contributions paid in the past to
private funds or wage supplementing funds */
*–> surtout pour les retraitÈs mais pas seulement, il y a aussi des
*jeunes qui en recoivent (ex: auxilio-natalidad ou auxilio-doeca: aide ‡ la
*maternitÈ ou pour les maladies (ie. SÈcu))
*by age, sort: sum br1980a_incpens if br1980a_incpens!=9999999
*sum br1980a_incpens if br1980a_incpens!=9999999 & br1980a_njob==5
rename br1980a_incpens inc_pension
replace inc_pension=. if inc_pension==9999999

** br1980a_incrent: Income received from rent
*0 = NIU. (Persons age 10+ with income from rent)
*9999999 = Unknown.
/*Q∞ 47: Average monthly income deriving from rentals or leasing of real estate,
furniture, vehicles, machines, etc., including sub-letting
–> pas seulement housingang rentals */
rename br1980a_incrent inc_rent
replace inc_rent=. if inc_rent==9999999

**br1980a_incdonat: Income received from donations
*0 = NIU. (Persons age 10+ with income from donations)
*9999999 = Unknown.
/*Q∞ 48: Average monthly income regularly received, deriving from monetary donations,
allowance (argent de poche/indemnitÈs) from non-residents of the household,
or alimony (pension alimentaire)*/
rename br1980a_incdonat inc_donation
replace inc_donation=. if inc_donation==9999999

** br1980a_otherinc: Other income received
*0 = NIU. (Persons age 10+ with other income)
*9999999 = Unknown.
/*Q∞ 49: Monthly average of other income received during the last 12 months and
deriving from investment or other use of capital*/
rename br1980a_otherinc inc_othercapital
replace inc_othercapital=. if inc_othercapital==9999999

** assigner une valeur de 0 inc_jobinkind quand c’est la seule source de rÈmunÈration dans le ou les jobs
*–> comparability purpose with the other census years
replace inc_jobinkind=0 if inc_mainjob==0 & inc_otherjob==0
*inspect inc_jobinkind if inc_mainjob==0 // ok
*inspect inc_jobinkind if inc_otherjob==0 // ok

gen double inctot_unharmonized= inc_mainjob+inc_jobinkind+inc_otherjob+inc_pension+inc_rent+inc_donation+inc_othercapital
*sum inctot_unharmonized inctot_bis
*gen test= inctot_unharmonized-inctot_bis
*inspect inctot_unharmonized inctot_bis
*inspect test
*inspect test if inc_jobinkind==0
*inspect inc_jobinkind
** 1∞ diffÈrence: vient du fait que inctot_bis oublie de prendre en compte inc_jobinkind
*(erreur d’IPUMS)
*inspect test if inctot_bis==9999997
*inspect test if inctot_bis==9999997 & inc_jobinkind==0
*drop test
*br inctot_unharmonized if inctot_bis==9999997
**2∞ diffÈrence: inctot_bis est top-coded (bien que Áa ne concerne qu’une seule valeur)

drop inctot_bis inctot
rename inctot_unharmonized inctot

*** ii) Incearn

gen double incearn_bis=incearn
*99999998 = Unknown/missing.
*99999999 = NIU (not in universe). (Persons age 10+)
*INCEARN reports the person’s total income from their labor (from wages, a business, or a farm)
*in the previous month or year.
replace incearn_bis=. if incearn_bis==99999998
replace incearn_bis=0 if incearn_bis==99999999

gen double incearn_unharmonized=inc_mainjob+inc_jobinkind+inc_otherjob

*inspect incearn_unharmonized incearn_bis
*gen test = incearn_unharmonized-incearn_bis
*inspect incearn_unharmonized incearn_bis test
*inspect incearn_unharmonized incearn_bis test if inc_jobinkind==0
*sum incearn_unharmonized incearn_bis test if inc_jobinkind==0
**1∞ diffÈrence: vient du fait que incearn_bis oublie de prendre en compte inc_jobinkind
*(erreur d’IPUMS)
*sum incearn_bis if inc_mainjob==0
*sum incearn_bis if inc_mainjob==.
*sum incearn_bis if inc_otherjob==0
*sum incearn_bis if inc_otherjob==.
*inspect incearn_unharmonized incearn_bis test if inc_otherjob!=0 & inc_jobinkind==0
*sum incearn_unharmonized incearn_bis test if inc_otherjob!=0 & inc_jobinkind==0
*sum incearn_unharmonized incearn_bis test if inc_jobinkind==0
*drop test
*2∞ diffÈrence: incearn est codÈ comme 0 quand inc_otherjob==0 (erreur d’IPUMS)

drop incearn_bis incearn
rename incearn_unharmonized incearn

drop br1980a_socsec inc_mainjob inc_jobinkind inc_otherjob inc_pension inc_rent inc_donation inc_othercapital
compress

**** c) recode les personnes out of universe
replace inctot=. if age<10
replace incearn=. if age<10
* ATTENTION: je n’ai pas recodÈ income as misising when income==0 ou age<10

Income variables IPU-1.docx (86 KB)
Income variables IPU.docx (89.5 KB)
Income variables IPU.txt (9.88 KB)
income IPUMS harmoni-1.zip (6.58 KB)
income IPUMS harmoni.zip (6.58 KB)

Here it is. Sorry for the ugly post, I thought the documents could not be attached.