CPSIDV Construction

Hi everyone,

My understanding, based on this site, is that CPSIDV was constructed by appending one character to the end of CPSIDP, which shares its first 12 characters with CPSID. I’m particularly interested in the period a household enters the CPS (MISH==1) to define a CPS cohort.

I thought it would be straightforward to do this by extracting the first 6 digits of CPSIDV and using this to construct a year-month. But when I tried to validate this against another notion of cohort, which is just the year-month of the observation minus MISH plus 1, I find that for a sizeable share of the observations (15% in my sample of outgoing-rotation-group households with nonmissing sex/race/age and wage information), these measures disagree - and a sizeable share of these disagree by 12 months or more (about 10% in my sample). These errors occur throughout the period from 1982 to 2024, with particularly high rates in 1985&1986, 1995, and 2001&2002, suggesting this has something to do with the difficulties in linking individuals over time across these periods.

Here is an example: consider the individual CPSIDV == 199109063712011 in September 1991. This individual is recorded as having minsamp == 8, indicating that they should have entered the CPS in June 1990. But then, shouldn’t their CPSIDV start with 199006?

Apologies if I’m doing something stupid here!

CPSIDV is a variable created by IPUMS CPS that uniquely identifies individuals across CPS samples using consistent values of AGE, SEX, and RACE to validate links. CPSIDV will not link individuals who appear to not be the same person across samples. Since the CPS samples housing units (i.e., dwellings) rather than following individual persons, this limits false links when people move or otherwise leave the CPS sample (unlike CPSIDP).

Could you share how you created the variable minsamp since IPUMS CPS does not have such a variable? This might help me provide a more targeted response. I assume that minisamp is derived from the variable MISH, which indicates the number of times occupants of a housing unit (i.e., dwelling) have been interviewed for the CPS up to that survey month. However, this is not the same as the number of times a particular individual is identified using CPSIDV since individuals living in the sampled dwelling may change during the CPS observation period.

For example, a person who moves into the unit between interviews is assigned the MISH value for their housing unit for the following interview round. Someone who moves into the housing unit between the unit’s seventh and eighth month-in-sample will be assigned MISH = 8 in the following interview, reflecting that this is the unit’s eighth interview. Similarly, an entire family might move out from their house between their seventh and eighth month-in-sample while another family moves in. Since the CPS samples housing units rather than following individual persons, the family will replace the old family for the housing unit’s remaining CPS panel rounds. In both cases, CPSIDV should not link the person to any prior records since they only appear in the CPS for their housing unit’s eighth month-in-sample. There are additional cases where it is not possible or practical to link respondents across the panel due to a lack of documentation (see Extending Current Population Survey Linkages: Obstacles and Solutions for Linking Monthly Data from 1976 to 1988 by Flood et al.).

The IPUMS Working paper that describes CPSIDV, A Holistic Approach to Validating Current Population Survey Panel Data (Rodgers & Flood, 2023), states that “the first time an individual is observed in the CPS (MISH=1 or the first instance of CPSIDP) a unique CPSIDV is assigned to them.” The individual that is identified by the CPSIDV value that you shared is only observed in the CPS panel once (in September 1991). They are assigned these first six digits to denote the month and year that they were first observed in the panel.