Issue with Matching Birth Province and Province of Enumeration to create interprovincial migrant flag


I’m currently working on replicating Brian Thiede, Heather Randell, and Clark Gray’s article titled “The Childhood Origins of Climate-Induced Mobility and Immobility.”

In this paper, they create a migrant flag by matching the geography level of enumeration fips code (6 digits for geolevel 1 and 9 digits for geolevel 2) to birth place country + province. A match between these fips codes results in a flag value of 1 and a mismatch results in a value of 0. From my microdata extraction, I have BPL'x' variables for each country, which have numbers that range from 2-5 digits (which uniquely ID provinces within countries) and country codes (which are 5 digits and do not match the three digit codes for country of enumeration) for each country.

To combat the difference in country fips codes between country enumeration and birth country, I created a crosswalk to match the 5 digit codes to the 3 digit codes.

The problem I’m having lies in getting the provinces to match up prior to creating my migrant flag.

My original strategy (which failed haha): GEOLEVEL1 and GEOLEVEL2 are 6 and 9 digit IDs for where a person was enumerated. The first 3 digit for both geolevels are the country, and the final 3/6 digits ID the province. I ended up scrapping data to create a crosswalk that relates the 2-5 digit BPL province codes to the 3/6 digit province fips codes from these two sources:


0    Bolivia                     Murillo        068   002001      201    002
1    Bolivia                      Ingavi        068   002002      208    002
2    Bolivia Aroma, Gualberto Villarroel        068   002003      213    002
3    Bolivia                  Sud Yungas        068   002004      211    002
4    Bolivia                    Larecaja        068   002005      206    002
5    Bolivia                    Omasuyos        068   002006      202    002

The IPUM variable from this scrapping process then matches the last 3/6 digits of geolevel1/2, yay! I used the variable BPL'X' for each country and the PROV variable in this new crosswalk to merge on these 3/6 digit IPUM values to create birth place geography identifiers to match to geolevel1 and geolevel2. However, as you know these data are year-specific and vary between census years for some countries and do not match the harmonized values across for geolevel1 and geolevel2 codes. I couldn’t find zip files that relate BPL"x" values to the standardized 3/6 digit province codes used for geolevel1 and geolevel2.

Do you know of a different strategy to create 6/9 digit geography identifiers for birth place from the BPL"X" and BPLCNTRY values to match the GEOLEVEL1 and GEOLEVEL2 codes to create a migrant flag? Thanks SO much!

It sounds like you are trying to create a lifetime migration variable that compares birthplace (BPL*) to current residence (GEO*). As you noted, the BPL coding system does not align with the GEO* variables. Addressing this is on our radar; however, I don’t have an expected release date for this work. The global IPUMS geography team is first focused on harmonizing migration variables and aligning those with current residence codes. Once this work is completed, they will shift their focus to the birthplace variables. Birthplace is more complicated than current residence or recent migration variables as it may also include locations that no longer exist at the time of enumeration; accordingly, the team has prioritized migration variables first. If you are interested in learning more, you may be interested in this working paper that discusses harmonized census geographies and spatio-temporal analysis with a case study.