I am trying to distinguish between high school dropouts and non-dropouts in IPUMS-CPS using EDUC from the years 1979-2000. The categories don’t match from before 1992 and after 1992, and so there is a huge break in the data. The variable description says you can construct a general EDUC code from EDUC by only reading the first two columns…but I don’t know which two columns it’s talking about.
How do I distinguish high-school dropouts from non-high school dropouts across years in IPUMS-CPS?
Your definition of “high school dropout” will dictate exactly how you identify such respondents in the CPS data. For instance, you could define high school dropouts as respondents over a certain age (e.g. 19) with a reported highest grade completed (EDUC) of either Grade 9 (040), Grade 10 (050), or Grade 11 (060). The difficulty arises in how you deal with those who completed Grade 12 without receiving a diploma. Unfortunately, sufficient data was not collected to maintain a consistent definition for this group across the 1992 change in the educational attainment question.
Specifically, for samples prior to 1992, respondents were not directly asked if they received a HS diploma or equivalent. The relatively small number of respondents that reported completing less than 1 year of college are identified as receiving a HS diploma or equivalent (EDUC=073), but the substantially larger group of persons that completed 12th grade have an unknown diploma status (EDUC=072). Due to this lack of detail, those captured in recent years by EDUC code “071: 12th grade, no diploma” cannot be similarly identified in pre-1992 samples.
As for constructing a general EDUC variable, the documentation is referring to the first two digits of the EDUC code, e.g. “01” corresponds to 1st-4th grade, “02” corresponds to 5th-6th grade, “03” corresponds to 7th-8th grade, etc. Using the first two digits provides an education variable that is consistent across time by sacrificing detail.
Hope this helps.