Matching CPS March supplement respondents longitudinally including monthly basic files

Hi everyone,

For my master thesis research, I wish to match individual respondents of the March supplement to their responses in the basic questionnaires over the years 2000-2013.

I have read pretty much all the relevant Q&As for this issue, and it seems to me the procedure I should implement is the following:

  1. Download NBER March Supplement data for each year you are interested and append (?) the files to generate one dataset which should share the same sort order as the IPUMS data extract (I am thinking of downloading from the NBER CPS supplements page)
  2. Extract the IPUMS-CPS March Supplement data without restricting sample to specific cases (of age, state, employment status, etc.)
  3. Perform a sequential merge between the two datasets while retaining the IPUMS data and the necessary identifiers from the NBER data to perform the match
  4. Extract the IPUMS-CPS basic files for other months
  5. Append these to the merged IPUMS-NBER file
  6. Use the procedure outlined by the NBER to match individuals

Is it correct?

Thank you.

Your outlined method should be successful (as long as you append the March NBER files in order by year). However, it is important to understand that the March supplement and March basic monthly samples cannot be merged due to the March over-sample (as mentioned in the first and second answers to this question). Because the NBER method for linking individuals was developed for matching persons across years (rather than months) you will encounter a number of new technical issues not addressed in their methods. I highly recommend reading through some of the articles that have written on Longitudinal Matching in the CPS.

I hope this helps.

Thank you for your answer Joe.

Just a quick follow-up on this. I have not yet attempted to link the basic monthly data to the March supplement data. But I have been able to link consecutive March supplement respondents using the procedure outlined by Lefgren and Madrian (in their paper). I tried using the program available in the NBER website but it seems to be outdated for more recent samples of the CPS (I’m currently using a 2005-2013 sample).

Given this I wrote a program myself which essentially follows the Lefgren and Madrian method using the variable names from the IPUMS data extract after performing the sequential merge with the NBER identifiers (I used h_idnum1 and h_idnum2, and a_lineno to identify the person). To link individuals between 2 consecutive March supplements I use the month in sample variable (mish in IPUMS) together with the NBER identifiers. This is what Lefgren and Madrian call naive merging. Afterwards, I adapt the code from the NBER program to be able to eliminate individuals with implausible differences in sex, race and age between 2 March supplements.

I do find it strange, however, that my sample after performing a naive merge retains only about 47% of possible mergers between 2005 and 2006, for example, while in Lefgren and Madrian they are able to merge 71% of the 1980-1998 data (see table 2 in their paper). Could there be something wrong with the procedure I followed or is it that there has been some significant change in the CPS since Lefgren and Madrian’s paper?


I assume that by “possible mergers” you are referring to persons in MIS 1-4 in 2005 and people who were in MIS 5-8 in 2006. If this is not the case, then 47% is a reasonable linkage rate, as less than half of all persons in a sample will appear in the following year’s sample.

If this is the case, however, 47% does seem a bit low. You may try linking the raw 2005 and 2006 ASEC samples (from NBER) to see if part of the merging process is interfering with the linking process. In response to your inquiry about changes to the CPS since the Lefgren and Madrian’s paper, the most significant change to the linking process would be the increase in the number of race response options from 4 to 21 in the January 2003 sample. This mostly impacts linkages that span the transition (i.e. linking respondents from 2002 to 2003), but the additional race categories may increase the likelihood of respondents changing their race identification between samples. With these more detailed race response options you may need to allow for more imprecise race code matching. Instead of checking that race in 2005 equals race in 2006 you could check that race in 2005 matches one of the possible similar race codes in 2006 (e.g. Black in 2005 can match any combination including Black in 2006). I would also recommend reading through the working draft of the paper “Making Full Use of the Longitudinal Design of the Current Population Survey: Methods for Linking Records Across 16 Months” which describes many more aspects of the linking process used by IPUMS-CPS to generate the forthcoming linking key variables.

I hope this helps.

How can I link person data from the 2019 ASEC supplement to the 2020 March basic cps person data?

You can link the CPS to leverage the panel component of these data using the variable CPSIDP. As noted in the CPSIDP documentation, it is important to validate these linkages using demographic characteristics. You may also be interested in these additional resources on linking CPS data as well as training materials from a workshop the IPUMS CPS team hosted in 2018 on linking the CPS.