Shoemaker occupation coding (occ50us) in 1850 U.S. census

I’m looking for shoemakers in the 1850 U.S. census and I’m confused about how their occupations are coded. Some of them are coded as being “Shoemakers and repairers” (occ50us==582) while others are listed as “Operatives and Kindred Workers (not elsewhere classified)” (occ50us==690) despite having “Shoemaker” as their listed occupation string (occstrng==“Shoemaker”). Is there a reason that some people who have “Shoemaker” as their occupation string are not coded with the shoemaker occupation (582 for occ50us)?

The coding of the OCC50US variable uses more than simply the information provided by OCCSTRNG. In particular, the IND1950 variable helps modify the transcribed occupation into more meaningful categories. So, for those with OCCSTRNG==“Shoemaker”, only those in the “Footwear, except rubber” industry (IND1950==488) are coded as OCC50US==“Shoemakers and repairers”. Those with OCCSTRNG==“Shoemaker” and OCC50US==“Operative and kindred workers” come from a variety of other industries.

I hope this is helpful. Let us know if you have any additional questions.

Thank you for your response, but I’m still not understanding what’s happening with the coding of OCC50US. The questionnaire text of the IND1950 variable says “Profession, occupation, or trade of each male person over 15 years of age” and the description of the variable says “In the 1850-1910 United States and 1891 Canada samples, the universe for IND1950 relied on persons having an occupation recorded in OCC50US.” The questionnaire text for OCC50US says “Profession, occupation, or trade of each person over 15 years of age.” OCCSTRNG says “No questionnaire text is available for this sample.” In looking at the original census forms for 1850, it looks like the only question that relates to occupation was “Profession, occupation, or trade of each male person over 15 years of age.” It looks to me like the original text from that question is coded in the variable OCCSTRNG, but maybe I’m missing something. Did the IND1950 and OCC50US variables use some additional information to create industry and occupation categories, besides what is written for that question?


These historical census files were created a little differently than modern-day census files. Rather than taking public data made available by the US Census Bureau, these files were created by enumerating data contained in images of original census enumeration forms. Therefore, it is difficult to closely follow how variables such as IND1950 and OCC50US were exactly coded.

Here is what I can say: You are correct that the only question relating to occupation in the 1850 census was “Profession, occupation, or trade of each male person over 15 years of age.” Quite a bit of variation exists in the responses to this question. (Note that OCCSTRNG truncates these responses due to excessive length.) Based on this information, industry and occupation codes were assigned and recorded in IND1950 and OCC50US. Together these variables preserve significant detail from the original responses. You may find the OCCHISCO variable to be better suited for your proposes. Additionally, the information provided in this paper about the coding of occupations in the NAPP database may be helpful.