Proposed higher-level occupational classification systems (improvements to OCC1950, OCC1990, and OCC2010?)

I’m excited to share some work on occupational data I’ve been focusing on for a long time. With many conversations ongoing about how AI might affect the job market over the coming decade, I wanted to take a detailed look myself about how much occupations typically evolve over each decade.

Of course, even in a series like OCC1950, there are a lot of caveats about the comparability of this occupational data over long periods of time. Referencing some of the Census technical papers cited here that describe the impact of occupational classification changes over time (another visual example of this here), I constructed three different series to assist in making more valid comparisons over time to construct the charts I’ve shared here:

  • A Max comparability series of 133 occupations based on OCC1950 codes but broader, roughly comparable all the way back to 1870.
  • A 1970 broad series of 236 occupations based on the IPUMS OCC2010 codes, but with more OCC2010s combined, accounting for occupations that were not yet broken out in the Census’ 1980-90 occupational classifications, or that were later combined.
  • A Post-1990 series of 408 occupations roughly based on OCC2010s, roughly comparable from 2000 going forward, and can bridge the gap between Census occupations and the slightly differently defined BLS SOC occupations.

It’s also important to note that there is a one-to-many relationship between each “Max comparability” occupation and “1970 broad” occupations and likewise between each “1970 broad” occupation and “Post-1990” occupations, allowing for more straightforward comparisons across years.

I’ll also share a public version of my crosswalk table (in Google sheets) here. It has a detailed README in the first tab, as the crosswalk consists of multiple parts, including the following components:

  • Each BLS SOC from 2000 forward is associated with a Census OCC and IPUMS OCC2010 in “OCC2010-OCC-SOC xwalk
  • Each Census OCC from 1960-1970 is associated with an IPUMS OCC1950, and each Census OCC from 1970 forward is associated with an IPUMS OCC2010 in “OCC codes (original)
  • I make some proposed modifications to the associations originally designated in IPUMS in “OCC codes (modified)
  • Each OCC1950 and OCC2010 is associated with the three broader occupational classification systems that I have proposed in “Series xwalk + pre-1970 raw emp”. Here, all OCC1950s and OCC2010s are associated with a “Max comparability” occupation, and each OCC2010 is associated with a “1970 broad” and a “Post-1990” occupation.
  • Each of the three proposed series is documented in more detail in a series of “Notes” tabs. The “Code Lookup Tool” tab allows you to see which individual Census OCCs and BLS SOCs fall under each category over time side-by-side.
  • Each occupation in each series is associated with a higher-level category that is similar to BLS’ major categories, minor categories, and broad occupations. However, there are some important differences to be aware of, which I highlight in this footnote of my latest piece. The “Post-1990 notes” tab explains specifically how Post-1990 occupations’ categorizations differ from BLS’, and from previous categorizations under the “1970 broad” and “Max comparability” series.
  • An estimate of net and gross transfers is made for each occupation for each key year (1960, 1970, and 1990) based on technical papers from the Census cited above. For SOC and OCC definitions that changed after 2000, another approach is used (detailed in the README) to control for them.
  • I made a comparison between my “1970 broad” series and OCC1990 that show noncomparabilities reduced by 40-80% in “OCC1990 error analysis

This is quite a bit of content of course. I am not affiliated with any academic institution, but would love to get some feedback on this approach, and open to collaborating with other researchers who may be interested in this same topic on a more formal working paper. If there are any IPUMS staff who would be interested in helping validate this approach, perhaps these series could one day be included as variables for others to use?