Current best method for linking individuals across ASEC samples from consecutive years?

I’m trying to link individuals’ responses across consecutive ASEC samples for every 2-year stretch from 1989 to the present. Some forum answers suggest using CPSIDP as the primary linking variable for consecutive ASEC samples, while others suggest using HRHHID and HRHHID2. Is one of these methods preferable to the other at this time? I’ve tried using CPSIDP, and this generally results in a match rate of about 30-35% between years, after confirming using race, sex, and age (this is after dropping individuals who were part of the ASEC oversample; otherwise the rate would be much lower). Might any other method result in a higher match rate?

A few sub-questions:

  1. Is it possible at this point to link individuals’ ASEC responses across years before 1989? This postfrom last year suggested this capability might be on the way.

  2. Is there any way to link individuals across consecutive ASEC years if they are part of the oversample (and thus have no CPSIDP identifier)?

Thanks much.

1 Like

We have a relatively new resource page that provides very helpful information for users interested in linking samples in IPUMS CPS. In general, the process you discuss above is correct. First, drop respondents from the ASEC oversample. Then, in order to create an individual panel dataset, use the variables CPSIDP and YEAR as person ID and time variables, respectively. Note that without any survey attrition or household migration the maximum number of households that are able to be linked in ASEC samples across two consecutive years is 50% of the sample.

Regarding your sub-questions: (1) At the present time CPSIDP is only available in ASEC samples back to 1989. We are working on making this variable available in pre-1989 samples, but this work is not ready for release yet. You can try to replicate the creation of MARBASECIDP and CPSIDP based on the documentation found in this working paper. Based on previous work on this task, we know that you’ll need to prune the ASEC samples for duplicates prior to merging across years and that it is probably a good idea to verify links based on AGE, SEX, and RACE. (2) Unfortunately, there is not a way at the moment to link members of the ASEC oversample across ASEC samples. This is because, to preserve confidentiality, additional precautions were taken that make it difficult to link these individuals.

Hello, this is Chiara - and I am also interested in this topic! Your answer was already super useful, Jeff, but I have few more questions.

Let me start by saying: I have retrieved ASEC data from the 70s to 2023 (I downloaded them altogether). I want to build a cross-section of individuals. That is: I want to build a dataset in which I have one answer per ASEC-respondent. If the respondent is in 2 consecutive years, I want to keep the earliest response (e.g., if in ASEC 2001 and ASEC 2002, I want to keep the ASEC 2002).

My questions are:

  1. Am I following the right procedure? I have downloaded all data altogether, as mentioned. From 1976 I can identify individuals based on CPSID and CPSIDV, and I use the latter to identify duplicates in my sample as I deem it to be more reliable. First, I drop all of those obs for which CPSIDV is 0 – ASEC oversamples. Then, I identify observations which have duplicates values of CPSIDV, and drop the one with the highest value of YEAR. I am asking just because it’s sightly different than what is suggested above so I started having doubts (I believe merge is recommended in some other posts too).
  2. Is there is now a way to link members of the ASEC oversample across ASEC samples? In point (1) I am dropping those who are part of the oversample because I cannot identify the “oversample individuals” which are interviewed in two consecutive ASEC surveys. Related, would you happen to know which is the % of oversampled individuals which are interviewed in two consecutive years? My guess is that the probability of being oversampled twice is very low.

While it is not common to drop repeated observations in analyses of CPS data, if your goal is to create a cross-sectional CPS ASEC sample then the procedure you outline makes sense to me. The IPUMS CPS team is exploring linking ASEC oversample respondents, but this capability is currently not available.

In terms of your analysis, the oversample group that you should be most aware of are Hispanic oversample respondents. As detailed in the technical write-up by Flood and Pascas, the CPS ASEC includes two oversample groups: the Hispanic oversample (1976-onwards), and the State Children’s Health Insurance Program (SCHIP) oversample (2002-onwards). The two methods of selecting SCHIP oversample respondents, as outlined in the linked paper, both ensure that these respondents are only included in a single ASEC sample. However, such a restriction does not exist for Hispanic oversample participants. These participants are drawn from additional interviews with November households (from the previous year) that contain one or more persons of Hispanic origin. The rotation pattern ensures that a household that was interviewed in November in one year will be interviewed again in November of the following year. As a result, Hispanic households interviewed in November can appear in two different ASEC oversamples. While the Hispanic oversample is much smaller than SCHIP, contributing about 5,500 households to the ASEC sample compared with about 16,000 for the SCHIP, identifying whether a Hispanic oversample respondent is in their first or second ASEC is challenging. I recommend reviewing this paper to better understand these challenges and potential solutions for linking oversample respondents.

There are a couple more things that you should be aware of:

  • The CPS sample includes housing units (i.e., addresses) rather than households composed of specific people. This means that if a family moves while they are in the panel, they will not be interviewed further in the survey. If a new family moves into the unit, they will take the place of the old family. CPSIDV will not link these new respondents since they will fail the validation check, but you may want to retain both sets of families. To do so, you will want to retain all unique CPSIDV values in your dataset.
  • As noted in the comparability tab for CPSIDP, linking between 1976 and 1977 is limited. The same restrictions noted on the tab apply to CPSIDV as well.
1 Like

Thank you so much for you answer, it is really helpful for deciding how to go about with my analysis.

As per your second point about potential replacement, I totally agree and thanks for pointing this out. This is why I simpy wanted to look at duplicates of CPSIDV. In the case you described, the respondents of the new family will have a different CPSIDV. Hence, no duplicates, and no drop of “old” obs, and I will keep this important (additional) info.

Thanks! I hope I won’t have to bother you again. (and apologies for the late asnwer)

1 Like