Hi there! I’m trying to plot average family income (ftotval) by educational attainment (educ, recoded slightly, into fewer categories), for families with people between the age of 30 and 39, inclusive. I had a couple questions and would be eternally grateful for some pointers.

First, I try to reduce the data set so that there’s one 3039 yearold person per family. Then I find the mean family earnings of those remaining folks by education, weighted using wtsupp. Is that reasonable? (Please see stata code below)

I have this weird jump in 1992 (see graph below, which is adjusted for inflation–but the inflation adjustment is not what created the jump; I checked). I think this is because the educ variable is created from a pre1992 education variable for highest degree attained (HIGRADE), and a post1992 variable that does something similar, but is still slightly different (EDUC99). Is that what’s throwing my graph? If so, is there anyway to solve this? I have trouble believing that bachelor’s degree holders earn less now than in 1970, in real terms.
Any help would be much appreciated. Thanks!
Here is my stata code:
/*
This is a do file that tabulates median family income of adults aged 30 to 39 by education level.
The data come from IPUMS CPS.
*/
use"cps_00032.dta",clear
sort year serial famsize marst
keep if age <40 & age > 29
drop if relate == 201 //here we’re dropping spouses, since we want to weight by family for our tabulation
//and we want only one person within each family to do //that
sort year serial famunit ftotval
by year serial famunit ftotval: gen dup = cond(_N==1,0,_n)//create indicator of ///duplicate fam. members
drop if dup > 1//drop all the duplicates
*br if year == 2001 & serial == 21366 //just looking at person who had family member //in the dataset before
//Make education variable that’s simpler
gen educ_simple = “NA”
replace educ_simple = “aLessHS” if educ < 73 //Those without HS diplomas
replace educ_simple = “bHS” if educ == 73 //HS degree
replace educ_simple = “cSomeCollor2yrDeg” if educ > 73 & educ < 111 //some coll. or //associate’s degree
replace educ_simple = “dBach” if educ >= 111 & educ <= 122
replace educ_simple = “eAdvDeg” if educ >= 123 & educ <= 125
//I’m counting 5 yrs of college and 6+ yrs
//of college as "bachelor’s degree.
//just put the a, b, c, d, e prefixes so they appear in the right
//order when tabbed.
drop if year < 1970
set more off
sort year educ_simple
forvalues i=1970(1)2014 {
qui summarize ftotval [aw=wtsupp] if educ_simple == “eAdvDeg” & year == `i’
generate avg_advDeg_`i’ = r(mean)
qui summarize ftotval [aw=wtsupp] if educ_simple == “dBach” & year == `i’
generate avg_bach_`i’ = r(mean)
qui summarize ftotval [aw=wtsupp] if educ_simple == “cSomeCollor2yrDeg” & year == `i’
generate avg_somecoll_`i’ = r(mean)
qui summarize ftotval [aw=wtsupp] if educ_simple == “bHS” & year == `i’
generate avg_hs_`i’ = r(mean)
qui summarize ftotval [aw=wtsupp] if educ_simple == “aLessHS” & year == `i’
generate avg_ltHS_`i’ = r(mean)
}
//2004 has different weights, so we have to put that in separately. We replace what we //had with the new averages.
//these use the person weights for 2004
qui summarize ftotval [aw=PERWT04] if educ_simple == “eAdvDeg” & year == 2004
replace avg_advDeg_2004 = r(mean)
qui summarize ftotval [aw=PERWT04] if educ_simple == “dBach” & year == 2004
replace avg_bach_2004 = r(mean)
qui summarize ftotval [aw=PERWT04] if educ_simple == “cSomeCollor2yrDeg” & year == 2004
replace avg_somecoll_2004 = r(mean)
qui summarize ftotval [aw=PERWT04] if educ_simple == “bHS” & year == 2004
replace avg_hs_2004 = r(mean)
qui summarize ftotval [aw=PERWT04] if educ_simple == “aLessHS” & year == 2004
replace avg_ltHS_2004 = r(mean)
br
keep if _n == 1
keep avg*
gen id = 1
reshape long avg_ltHS_ avg_hs_ avg_somecoll_ avg_bach_ avg_advDeg_, i(id) j(year)
br