I am trying to create household identifiers in order to match basic monthly CPS data longitudinally from January 1994 to December 2010.
As a first step, I’m creating ID numbers that uniquely identify each household and person within each month. From what I understand, to identify a household uniquely within a month, it’s sufficient to concatenate HRHHID, HRHHID2, and STATEFIP; concatenating PERNUM should then identify each person uniquely. However, when I do this, it seems to generate a huge number of duplicates within each month. By contrast, when I concatenate SERIAL and PERNUM, it doesn’t generate any duplicates within a given sample month, but SERIAL won’t allow me to undertake the next step of what I want to do, which is identifying households across months.
Is there something I’m missing about the identification of households within months? Why are HRHHID, HRHHID2, and STATEFIP apparently insufficient to identify households uniquely within each sample month?
I was also under the impression that, to identify a household across months, it’s sufficient to concatenate the within-month household identifier (HRHHID, HRHHID2, and STATEFIP) with a variable for the date when the household entered the CPS. Of course, the concern of matchin across months is subordinate to the problem of using HRHRHID and HRHHID2 (and potentially other variables) to identify households uniquely with each month.
My approach is basically trying to undertake the steps described in Drew, Flood, and Warren, although I’m only using data from 1994 onward. Any advice would be much appreciated.