I noticed that in some censuses, e.g. Senegal 1998 and 2002, ethnicities that have the same name are encoded with different numerical value. E.g. in the Censuses of Senegal “Wolof” is assigned the code 113 and 130, where 113 is only available in 2002, but 130 is available in both years. Is this an error in the coding, or are they supposed to refer to different ethnic groups. It seems there are more countries where this happens.
This appears to be a labeling issue, however it is difficult to . The ETHNICSN value 113 should be labeled “Khassonke”. I have brought this to the attention of the IPUMS-International team and they are working on correcting the issue now. In the meantime, since you have pointed out a significant issue with the IPUMS data you get a mug!, please email ipums@umn.edu with a mailing address where we can send your coveted prize.
Thanks for the answer (and the mug). By any chance do you have a link where the correct codes are shown? When I look at the IPUMS-International website, it does not show the code assignment you mention. Since this seems an important issue I will try to go back and see which other censuses I had similar issues, so they can be corrected.
IPUMS-International are currently working on getting the corrected labels online. The identification of “Khassonke” as the correct lable for 113 was actually intuited from the data (using LANGSN) as there was no indication of the proper coding in the source documentation for the sample.
I did not have time to get into the other datasets, but I can give you the countries which I was working with, where I think similar issues are happening (a fast way to figure it out is to try to import using Pandas in Python, which will complain about repeated labels for categorical variables). Here’s the list: Senegal, Ghana, Uganda and Malawi.