Errors in matching longitudinal data

I use the matched ASEC sample for forensic economic forecasting. After tediously performing my own matches (which I still do for monthly interviews), I was delighted with your Longitudinal protocol. Alas, when I downloaded the matched sample I found two problems, at least for my work. First, all unmatched observations are deleted – this precludes my usual Heckman sample selection correction (via STATA). Second, out of a sample of 1,635,854 I found 60,139 obvious mismatches – age_2 < age_1 or age_2>age_1+2. I coded a variable “matched” = 0 for those 60,139 mismatches and set the “matched” variable = 1 for the 1,575,715 successes. I then used the year, month (March only to exclude oversamples) and cpsidp to match the first interview (mish = 1, 2, 3 or 4) with the “_1” variables. I was able to identify 482,717 apparent household address changes where the replacement household was not at the same address. My Heckman corrections appear to behave like the past.

Has anyone else encountered this problem? Is my “fix” appropriate? Is there a way something like this procedure could be added to your protocol?

Thank you,

Thomas Carroll, Ph. D.
Senior economist
Optimal Analytics, Ltd.
Las Vegas, NV

I’m glad to hear that you found the longitudinal ASEC extracts helpful for your research. The longitudinal files currently use the variable CPSIDP to link respondents across their appearances in the CPS panel. As the variable description for CPSIDP notes, “it is important to verify CPSIDP linkages with AGE, SEX, and RACE. In some cases CPSIDP will result in erroneous links, which are due to errors in the source data. Cases with the same CPSIDP value may also have inconsistent responses across samples due to errors on the part of the respondent or in recording the response.” These are the cases that you are finding displaying inconsistent age values. Note however that AGE in the CPS is top coded such that respondents from April 2004-onward who report being 80-84 are coded as 80 and those 85+ are coded as 85. Therefore, it’s possible for someone to increase their age by 5 years during their time in the panel if they began the panel at age 83 or 84. IPUMS has also recently begun providing the linking variable CPSIDV, which only makes links between those records whose SEX and RACE values do not change and whose AGE values change in expected ways over time (see Drew et al., 2014 for more details on CPSIDP and Rogers & Flood, 2023 for more details on CPSIDV).

I’m not exactly sure what you mean when you say that you found “482,717 apparent household address changes where the replacement household was not at the same address.” Respondent household addresses are not provided in the CPS PUMS file. IPUMS does offer the variable MIGRATE1, which indicates whether the respondent had changed residence in the past year. There will be a few cases where respondents moved into the housing unit within the past year and have taken the place of respondents who had previously been in the panel. However, when I linked 2022 and 2023 ASEC respondents, I found out that 98.6% of the 35,919 cases linked with CPSIDP reported migrate1 = 1 (same house) in 2023. Seeing 1/3 of the sample having a household address change sounds very strange to me and I would be curious to learn which variables and samples you used to come to this value.