Questions about using Pre-linked Longitudinal ASEC


I am learning how to use the pre-linked longitudinal ASEC extracts (2017-2022). I understand that there are some uncertainties regarding how individuals are identified if users link cross-sectional ASEC samples on their own.

But, I noticed that in the pre-linked extracts, there are sizable amount of individuals whose sex_1 and sex_2 do not correspond. Does it mean that we have to validate the individuals in the prelinked ASEC extracts? Is dropping the discordant cases by sex_1/sex_2; race_1/race_2 adequate? If not, what is the best approach to ensure data quality?

I am also working on linking spouse or unmarried partners to the data. Is the approach similar to other data extracts? I’ve tried to create a spouse data by sample Stata codes below, and merge it to the full data. But it still left many spouses unmatched. Please let me know what I did wrong in this attempt if you can. Or if there is a better way to identify partners. I can provide more details if this is not enough information. Thanks!

keep if sploc_1 >0
drop pernum_1
rename sploc_1 pernum_1
save “spouse_data.dta”

use “full_data.dta”
merge 1:1 year_1 serial_1 pernum_1 using “spouse_data.dta”

I took a look at the longitudinal ASEC data from 2017 to 2022 and I see that fewer than 1% of individuals have SEX_1 not equal to SEX_2. The links in these longitudinal files are created using CPSIDP, a unique person identifier that can be used to link individuals across multiple CPS samples. Links created by CPSIDP are not validated; that is, there are instances in which different individuals are identified as being the same individual with CPSIDP. Individuals who move during the CPS observation period are not followed and are no longer surveyed. The individual(s) residing in the sampled dwelling will instead be interviewed in their place and take on their values of CPSID (the household identifier) and CPSIDP (the person identifier). IPUMS has created a validated longitudinal person identifier for the CPS, CPSIDV, which we validate using race, sex, and age. In practice, you can see that these links are validated by counting the number of respondents in your longitudinal extract with SEX_1!=SEX_2 & CPSIDV_1==CPSIDV_2 (the count is zero). You could validate the links made with CPSIDP yourself, and if so, I would recommend validating them using race (should be the same), sex (should be the same), and age (should not change more than expected between the time periods linked across). Alternatively, you can manually link individuals in a cross-sectional data extract using CPSIDV.

To answer your question about data on spouses, the CPS samples entire households, and CPS data include records for each member of each sampled household. There is nothing additional you need to do to include the person records of the spouses or unmarried partners of individuals in your data extract. However, spouses and unmarried partners of CPS respondents who do not live in the household with the CPS respondent will not be linkable in the CPS. In other words, if individual A is observed in the CPS and their spouse, individual B, does not live in their household, you will not be able to obtain data on individual B. The variable SPLOC reports the PERNUM of each individual’s spouse within their household. If there is no spouse present in the household, then SPLOC=0. You can use the SPLOC variable to identify each respondent’s spouse and attributes about the spouse you are interested in. Alternatively, you can use the attach characteristics function of IPUMS CPS to attach information about each person’s spouse to their own person record. If you can share more about what you are hoping to do with data on linked spouses, I may be able to provide more helpful or targeted guidance.