Industry and Occupation code harmonization


I searched the forum for related questions on this - apology if I missed an important point - but I’d like to know more about the creation of variable INDGEN from unrecorded variable IND, and likewise OCCISCO from OCC. I understand that IND is recorded according to the original national industry classification, but I have trouble finding the industrial classification standard that particular country follows (e.g. HS, SITC, NAICS) except for a few, e.g. Canada 1980 SIC, Nicaragua ISIC 3.1. Since there are numerous conversion tables between standard industry classifications, I could potentially use them for harmonization. However, if the original industry classification standard is unknown, I am wondering how IPUMS-I managed to convert IND into INDGEN by ISIC Rev 4.

I am asking this question because I need to construct a finer harmonized industry variable than INDGEN from variable IND, say at 3-digit level. I would extremely appreciate it if you could have some advice on how to efficiently harmonize the IND variable at finer level. Likewise, advice on harmonizing occupation code would be equally appreciated.

Thanks very much,

The codes used in the variables IND and OCC differ by country and year. From the IND and OCC codes pages, you can link to the source variables for each country and year, which also list the coding system used in that particular sample. You may be able to infer whether a given coding system was used by looking at this list, even if the system is not stated explicitly. Many national censuses did not use an international standard coding system such as ISIC. The IPUMS International team did most of the recoding into INDGEN/OCCISCO manually using the criteria listed on the comparability tabs for INDGEN and OCCISCO. There are a few countries that used the ISCO system for coding occupations, and these can be identified using the variables ISCO68A and ISCO88A. Apart from the comparability tabs for IND and OCC, and the description pages for the source variables, it’s possible there may be information on the occupation or industry coding system used in the enumeration materials.

Hi Matthew, great thanks to your reply!

I may be missing somewhere but I noticed that source variables (industry/occupation) with a listing of coding system are very few. The national coding system is usually not part of the variable description. If I am very lucky, I may find it in the census documentation online. Unfortunately, that did not happen very often. Hence, one may only do harmonization manually from the literal meaning of the unrecorded variable label. Let me know if I am wrong and you have better ideas!

Yes, as far as I can tell that’s going to be your only option in most cases, and the IPUMS-I team had to do a lot of manual matching to create INDGEN and OCCISCO. You can use software to make potential matches based on similar labels, which might speed the process somewhat. For example there are commands -reclink- and -matchit- in Stata. It might also help to use the broad code matches available in INDGEN and OCCISCO as a starting point.