What are “unharmonized variables” in the NAPP data?
All variables in NAPP are processed to varying degrees. They are documented in English and associated with the relevant sections of the original census instructions. The data are analyzed and often recoded for technical and other considerations. But, not all NAPP variables are “integrated” for international and inter-temporal comparability.
The regular NAPP variables – the ones on the main variable availability screen – are integrated: the same codes and labels apply across all the samples that contain the variable. Unharmonized variables, in contrast, are unique to each sample. They generally correspond to the variables in the original datasets submitted to the NAPP project by the various countries. The unharmonized variable codes and labels are not consistent across samples, but the variables have been processed to make them more regularized. Stray values are recoded; all data are converted to numeric values; data universes are empirically determined; unknown and NIU categories are coded consistently. In addition, each unharmonized variable is assigned a unique name in the NAPP database, and the value labels and other variable documentation are written in English.
Many unharmonized variables serve as inputs for the integrated NAPP variables. For example, underlying the integrated variable for marital status, MARST, are numerous unharmonized variables, typically one per sample – CA81A428 for Canada1881, GB81B409 for Scotland 1881, and so forth. Each integrated NAPP variable description has a link to the unharmonized variables that served as its input. The unharmonized variables are also accessible in a comprehensive list using the button near the top of the variables page. The variable description for each unharmonized variable lists the integrated variables for which it provides the source data.
The unharmonized variables can be included in data extracts. Thus researchers can get both the integrated and unharmonized forms of specific variables (for example, the internationally comparable employment status variable, OCC, and the employment status variable specific to 1900 Norway, NO00A434). Perhaps more importantly, the unharmonized variables give researchers access to data that NAPP has not been able to incorporate in an internationally comparable manner.