Merging Supplemental Migration data to IPUMS international census data

Dear IPUMS users,

I am planning to use census data from Malawi to investigate migration patterns. I also downloaded the 2008 Malawi emigration supplemental file.


According to the description, the ‘sample’ and ‘serial’ variable should identify an observation and should allow to merge the two dataset.

However, I noticed that the main census dataset has a ‘person’ variable which reports the number of persons in a given HH.
Since it is not possible to merge the data using the hyerarchical structure of IPUMS data when both household and person level variables are selected (it returns that the identifier in the supplementary data neither is unique in the Census, nor in the supplemental dataset), I though I could append the external migration dataset.

However, I already have records in the dataset for each member of the household as reported by the ‘person’ variable. The appendingprocess would add additional household members.

What I would like to know is, should these additional persons add up to the total household size? Or, the persons in the houshold already include those reported in the supplemental migration dataset and would be therefore duplicated? In this latter case, is there any way to match the people who migrated, included in the supplemental data file, with those in the main census data?



A few details may be helpful here.

First, note that the IPUMS extract system allows for files to be downloaded in both hierarchical and rectangular data structures. The rectangular data structure, where each observation is a person and household-level information is constant for all persons within a given household, is generally recommended. Using the data in this structure might help solve some of the problems listed here.

Second, when using the data in the default rectangular structure, SERIAL and PERNUM will uniquely identify person-level observations in the IPUMS sample. The values for SERIAL and PERNUM, however, are unique to the IPUMS database. Therefore, if you want to link to external data sources, you’ll need to use the source variables MW2008A_DWNUM and MW2008A_PERNUM. These source variables can be found by selecting the “Source Variables” radio button on the top of the Select Data Page.

Third, to address your specific question: The values recorded in the PERSONS variable indicate the number of persons living in the household. Based on the information in the 2008 Malawi sample characteristics, I suspect that those persons recorded in the supplemental migration data file are not included in the sample available via IPUMS International. This is due to the definitions of dwellings, households, and collective dwellings used by the 2008 Malawi Census. Namely, a household is collection of one or more persons, “who live together and make common provision for food.” Additionally, the 2008 sample available on IPUMS International, has information on migration, but this is about previous migration rather than current migration or emigration. With that said, I encourage you to check the documentation available for the supplemental migration file to verify this detail.

Dear Jeff,

Thank you very much for your kind reply. Yes, I actually downloaded the information in rectangular form. In this sense I wrongly used the term “hyerarchical”. What I meant is that each record has two orders of information, the household-related one and the person-related one.

About the third point, considering the documentation, I think it makes sense to consider emigrants as additional household members, unrecorded in the household size record. I already read the documentation, but I have probably been distracted by the additional questionnaire form related to international migrants, and I could have miss something! I will double-check the documentation to get rid of any doubt!