My first question is: Because most of the codes listed in the crosswalk have either three or four digits, does this mean that I can include all six-digit NAICS codes that share the same three or four digits listed in the crosswalk? (e.g., if it is coded as 442, should I include all 6-digit NAICS codes that start with 442)?
My second question concerns the coding scheme. I noticed that the codes are labeled as follows:
M = Multiple NAICS codes
P = Part of a NAICS code (a NAICS code split between two or more Census codes)
S = Not specified industry within a NAICS sector (specific to Census codes only)
Z = Exception to NAICS code (part of a NAICS industry has its own Census code)
I have been struggling to fully understand these coding schemes and would greatly appreciate your guidance. For example, regarding “M”: if a code is listed as “113M,” does this mean that it includes all NAICS codes that begin with 113?
Additionally, what are the implications when NAICS codes are labeled as P, S, or Z? For instance, if a code is listed as “333MS” in the crosswalk, would it be appropriate to include all six-digit NAICS codes that begin with 333?
Before I answer your questions, I should preface that the ACS reports industry using two different coding systems. Census industry codes are provided in the variable IND, while NAICS codes are reported in INDNAICS. While the granularity of the NAICS codes is very useful for categorizing industries, their specificity poses a confidentiality risk in public microdata published by the Census Bureau. As a result, NAICS codes are aggregated before being reported in INDNAICS. The industry codes in IND meanwhile are created specifically to allow for reporting of industry at an approved granularity.
To answer your first question, codes listed with fewer than all six digits are meant to include all NAICS codes that share the same digits. In your example, code 442 (“Furniture and home furnishings stores”) in the 2018 NAICS system includes all codes that begin with the digits “442”. However, as you noted, the exception to this are codes in the crosswalk that include a letter. Code 113M (“Forestry, except logging”) includes 2018 codes 113110 (“Timber Tract Operations”) and 113210 (“Forest Nurseries and Gathering of Forest Products”), but as noted in the label it does not include 113310 (“Logging”). Without the “M”, we would expect code 113 to include all three of these industries. While the P, S, and Z are notes about how the NAICS code maps onto the Census IND code, they also affect how the code should be interpreted. In 333MS (“Machinery manufacturing, n.e.c. or not specified”), the “M” indicates that this code is composed of multiple, but not all, industries that start with “333”. The “S” additionally indicates this industry category was created to correspond to an IND census code and has no direct equivalent in the NAICS. This Census Bureau crosswalk between IND and NAICS codes might help better visualize what these labels are communicating.
As you can see, this can get very complicated before beginning the work of crosswalking codes across different vintages. You might therefore look into using our harmonized industry variable IND1990. This variable maps Census IND codes from all years since 1950 into the 1990 Census Bureau industrial classification scheme, offering a consistent long-term classification of industries. You can read more about how crosswalks were used to construct this variable in our user note on integrated occupation and industry codes.