What are "unharmonized variables" in IPUMS-International?

What are “unharmonized variables” in IPUMS-International?

All variables in the IPUMS are processed to varying degrees: they are documented in English, associated with the relevant sections of the original census instructions, and the data are analyzed and often recoded for technical and other considerations. But not all IPUMS variables are “integrated” (harmonized) for international and inter-temporal comparability.

The regular IPUMS variables – the ones on the main variable availability screen – are integrated: the same codes and labels apply across all the samples that contain the variable. Unharmonized variables, in contrast, are unique to each sample. They generally correspond to the variables in the original datasets submitted by the various countries to the IPUMS project. The unharmonized variable codes and labels are not consistent across samples, but the variables have been processed to make them more regularized. Stray values are recoded; all data are converted to numeric values; data universes are empirically determined; unknown and NIU categories are coded consistently; and other edits may be made to address confidentiality concerns. In addition, each unharmonized variable is assigned a unique name in the IPUMS database, and the value labels and other variable documentation are written in English.

Many unharmonized variables serve as inputs for the integrated IPUMS variables. For example, underlying the integrated variable for marital status, MARST, are numerous unharmonized variables, typically one per sample – CL70A_MARST for Chile 1970, UG91A_MARST for Uganda 1991, and so forth. Each integrated IPUMS variable description has a link to the unharmonized variables that served as the inputs for it. The unharmonized variables are also accessible in a comprehensive list using the menu buttons on the variables page. The variable description for each unharmonized variable lists the integrated variables for which it provides the source data.

The unharmonized variables can be included in data extracts. Thus researchers can get both the integrated and unharmonized forms of specific variables (for example, the internationally comparable employment status variable, EMPSTAT, and the employment status variable specific to 1998 Cambodia, KH98A_EMPSTAT). Perhaps more importantly, the unharmonized variables give researchers access to data that IPUMS has not been able to incorporate in an internationally comparable manner.

Some unharmonized variables are not available to researchers because of confidentiality concerns or other reasons. Even variables that serve as inputs to the regular integrated IPUMS variables may be hidden.