Changing Occupation and Industry into Dichotomus Variables

Needing to change Occupation and Industry into dichotomous variables, or at the very least reduce the number of options for both variables to only 4 options. Has anyone ever done this before, and if so, what did you do? If anyone has any thoughts or research on how we could proceed with this, please feel free to share as well.

Since a dichotomous variable assigns all observations into one of two categories (e.g., “yes/no”), it may be suitable for analyses that compare two different occupations or industries (e.g., nurses vs. doctors). However, it does not lend itself to analyses of all of the hundreds of different occupations and industries that appear in the data. It is possible to group these into a few large categories based on similarities. It isn’t clear to me how you plan to use a condensed occupation/industry variable, so I provide some possibilities for collapsing the categories and other variables that may be of interest below.

As an alternative, you might consider using EMPSTAT to group respondents in a dichotomous variable that reports whether they are or are not employed. You might also consider using the class of worker variable (CLASSWKR). The detailed version of this variable (click to see detailed codes in the codes tab) identifies respondents as part of one of four general groups:

  1. Self-employed (incorporated or unincorporated)
  2. Working for wages/salary at a private company (for-profit or non-profit)
  3. Working at different levels of government (federal, state, and local)
  4. Unpaid family workers

Since occupation and industry depend on each other significantly (e.g., most people involved in crop production have an occupation related to agriculture), it’s common to focus on either one or the other. The Census Bureau organizes occupation codes in samples collected from 2018-onwards into six large occupation category groups:

  1. Management, Business, Science, and Arts (OCC 10-3550)
  2. Service (3601-4655)
  3. Sales and Office (4700-5940)
  4. Natural Resources, Construction, and Maintenance (6005-7640)
  5. Production, Transportation, and Material Moving (7700-9760)
  6. Military (9800-9830)

With regards to industry (IND), the 2022 North American Industry Classification System (NAICS), which 2023-onwards Census industry codes are based on (see INDNAICS and this crosswalk), is organized into 20 major groups. One method to group these further would be into primary (agriculture and natural commodity extraction), secondary (manufacturing), and tertiary (services) sectors, but you may find a grouping that works better for your case.

Thank you for your response! The reason for changing the variables into dichotomous or limiting the variables to 4 options is because the research team that I am apart of is currently trying to recreate an Urban Institute study concerning non-traditional childcare hours.

The study that we are recreating requires the usage of binary logistic regression that uses the occupation and industry variables of both the SIPP and ACS. The study unfortunately does not explain how they collapsed their variables to perform the binary logistic regression. We had thought of categorizing the occupations and industry variables as ‘White Collar’ or ‘Blue Collar’ professions, but we are still playing with a few ideas that your response will certainly help with.

Thanks again!