Using HISTID for a longitudinal study

Hello! I am a student at Vanderbilt University attempting to conduct a longitudinal study of the social mobility of African Americans as a result of WPA expenditure by county. I currently am trying to clean data in R using 1940 and 1950 census data, and I see that every row has a unique HISTID in the single sample with both decades of data. Am I doing something wrong? I thought that there should be many repeat HISTID between 1940 and 1950, in order to conduct this longitudinal study, but I am currently seeing there are 0 repetitions of HISTID’s implying there are no people in the 1940 census data that were repeated in the 1950 census, which doesn’t make sense. Also, you cannot link any datasets to the 1950 census it seems. Any advice?

Hi Evan,

HISTID does not identify the same individual across decennial census samples; it identifies the same record across different release versions of the IPUMS historical full count data. New versions of the full count data may change values for some variables (e.g., occupation) or even eliminate records (e.g., we may combine households or drop what appears to be a duplicate record); HISTID allows researchers who have downloaded the full count data to track how the individual records have changed after a new version is released. I think what you’re looking for is the Historical Identification Key (HIK), which links the HISTIDs of people identified using the IPUMS Multigenerational Longitudinal Panel (MLP) algorithm across the full count decennial files.

There are currently two methods provided for using the IPUMS Multigenerational Longitudinal Panel (MLP) to link people across different decennial censuses. The first utilizes the link census data option when adding full count samples to your data cart. This option generates a long data file where each person has an observation in each census sample, filtered to people who are linkable using our algorithm. HIK is automatically added to these extract types. However, this method currently does not allow for linking to the 1950 full count (we hope to update the data access system to include these links sometime in 2025).

The second method is to use our linking crosswalks, which match HISTID values for each person across the full count decennial samples. This is currently the only way to link individuals in the 1950 census and includes the latest version of IPUMS MLP links. It sounds like you have already downloaded the 1940 and 1950 full counts separately; the next step is to download the 1940-1950 10-year crosswalk and use it to merge records between the two data files (see the user guide for sample code). Hope this helps clarify the data!