Unique household/person identifiers for matching basic monthly CPS data longitudinally

I am trying to create household identifiers in order to match basic monthly CPS data longitudinally from January 1994 to December 2010.

As a first step, I’m creating ID numbers that uniquely identify each household and person within each month. From what I understand, to identify a household uniquely within a month, it’s sufficient to concatenate HRHHID, HRHHID2, and STATEFIP; concatenating PERNUM should then identify each person uniquely. However, when I do this, it seems to generate a huge number of duplicates within each month. By contrast, when I concatenate SERIAL and PERNUM, it doesn’t generate any duplicates within a given sample month, but SERIAL won’t allow me to undertake the next step of what I want to do, which is identifying households across months.

Is there something I’m missing about the identification of households within months? Why are HRHHID, HRHHID2, and STATEFIP apparently insufficient to identify households uniquely within each sample month?

I was also under the impression that, to identify a household across months, it’s sufficient to concatenate the within-month household identifier (HRHHID, HRHHID2, and STATEFIP) with a variable for the date when the household entered the CPS. Of course, the concern of matchin across months is subordinate to the problem of using HRHRHID and HRHHID2 (and potentially other variables) to identify households uniquely with each month.

My approach is basically trying to undertake the steps described in Drew, Flood, and Warren, although I’m only using data from 1994 onward. Any advice would be much appreciated.


Hi Ross,

I’m doing something similar, and I uniquely identify individuals using HRHHID, HRHHID2 and LINENO, which is the line number of the individual in the original form. I do not get duplicates when I follow this procedure. It seems PERNUM does not identify individuals across IPUMS-CPS samples (only within samples, read PERNUM). But I’m pretty sure the LINENO identifier does.

Moreover, notice that the HRHHID, HRHHID2 and LINENO are imperfect identifiers, as they have been re-used when a new family moves into an household, for example. The approach usually taken is to use demographic information, like AGE, SEX, RACE to confirm that the matches are correct. The Drew, Flood and Warren paper is a good read and I recommend as well Madrian and Lefgren (1999)

Good luck,