finding the mean of family income by education for 30-39 year olds in the March CPS from 1970 to 2014

Hi there! I’m trying to plot average family income (ftotval) by educational attainment (educ, recoded slightly, into fewer categories), for families with people between the age of 30 and 39, inclusive. I had a couple questions and would be eternally grateful for some pointers.

  1. First, I try to reduce the data set so that there’s one 30-39 year-old person per family. Then I find the mean family earnings of those remaining folks by education, weighted using wtsupp. Is that reasonable? (Please see stata code below)

  2. I have this weird jump in 1992 (see graph below, which is adjusted for inflation–but the inflation adjustment is not what created the jump; I checked). I think this is because the educ variable is created from a pre-1992 education variable for highest degree attained (HIGRADE), and a post-1992 variable that does something similar, but is still slightly different (EDUC99). Is that what’s throwing my graph? If so, is there anyway to solve this? I have trouble believing that bachelor’s degree holders earn less now than in 1970, in real terms.

Any help would be much appreciated. Thanks!

Here is my stata code:

/*

This is a do file that tabulates median family income of adults aged 30 to 39 by education level.

The data come from IPUMS CPS.

*/

use"cps_00032.dta",clear

sort year serial famsize marst

keep if age <40 & age > 29

drop if relate == 201 //here we’re dropping spouses, since we want to weight by family for our tabulation

//and we want only one person within each family to do //that

sort year serial famunit ftotval

by year serial famunit ftotval: gen dup = cond(_N==1,0,_n)//create indicator of ///duplicate fam. members

drop if dup > 1//drop all the duplicates

*br if year == 2001 & serial == 21366 //just looking at person who had family member //in the dataset before

//Make education variable that’s simpler

gen educ_simple = “NA”

replace educ_simple = “aLessHS” if educ < 73 //Those without HS diplomas

replace educ_simple = “bHS” if educ == 73 //HS degree

replace educ_simple = “cSomeCollor2yrDeg” if educ > 73 & educ < 111 //some coll. or //associate’s degree

replace educ_simple = “dBach” if educ >= 111 & educ <= 122

replace educ_simple = “eAdvDeg” if educ >= 123 & educ <= 125

//I’m counting 5 yrs of college and 6+ yrs

//of college as "bachelor’s degree.

//just put the a, b, c, d, e prefixes so they appear in the right

//order when tabbed.

drop if year < 1970

set more off

sort year educ_simple

forvalues i=1970(1)2014 {

qui summarize ftotval [aw=wtsupp] if educ_simple == “eAdvDeg” & year == `i’

generate avg_advDeg_`i’ = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == “dBach” & year == `i’

generate avg_bach_`i’ = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == “cSomeCollor2yrDeg” & year == `i’

generate avg_somecoll_`i’ = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == “bHS” & year == `i’

generate avg_hs_`i’ = r(mean)

qui summarize ftotval [aw=wtsupp] if educ_simple == “aLessHS” & year == `i’

generate avg_ltHS_`i’ = r(mean)

}

//2004 has different weights, so we have to put that in separately. We replace what we //had with the new averages.

//these use the person weights for 2004

qui summarize ftotval [aw=PERWT04] if educ_simple == “eAdvDeg” & year == 2004

replace avg_advDeg_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == “dBach” & year == 2004

replace avg_bach_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == “cSomeCollor2yrDeg” & year == 2004

replace avg_somecoll_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == “bHS” & year == 2004

replace avg_hs_2004 = r(mean)

qui summarize ftotval [aw=PERWT04] if educ_simple == “aLessHS” & year == 2004

replace avg_ltHS_2004 = r(mean)

br

keep if _n == 1

keep avg*

gen id = 1

reshape long avg_ltHS_ avg_hs_ avg_somecoll_ avg_bach_ avg_advDeg_, i(id) j(year)

br

I’ve got no good answers for you, but some questions:

  1. Why use mean instead of median for income? Given income’s long right tail, and a tail that has increased over the last thirty years, a mean might not tell you what you want. Of course, this depends on what you want.

  2. The table linked below, while not the same age range as you are interested in, shows a decrease in income from 1991 to 2013 for college degree holders. That is consistent with your tabulation, so maybe your calculation is not too far off?

http://www.census.gov/hhes/www/income/data/historical/people/2013/p18.xls

[Education, btw, is not a good way to deal with poverty or inequality – see http://www.amazon.com/Class-Dismissed… ]

  1. As for the dip in 1991, could this be due to a change in the definition of a household at that time? Getting frequencies for family size and makeup (are grandparents living in household included?), and plotting those over time, might be a way to investigate.

Good luck.

I was able to mostly replicate your chart, but I get a steady increase in advanced degree income before it levels out in 2010. Your STATA code also seems reasonable.

The drop you are seeing in 1992 is likely due to the education recode you mentioned. Unfortunately, there is not an obvious way to adjust for this change. This paper made an effort at accounting for the discontinuity and may be of interest to you. Also keep in mind that you are not controlling for any characteristics in your plot, such as the decrease in household size or the change in demographics of each education group (e.g. higher rates of education amongst women). As pointed out in the previous answer, however, your post-1991 results do seem similar to the official Census tables.

Hope this helps.