unique person id variable?


I created a person id variable by concatenating the year, datanum, serial, and pernum. I am using the 2010 3 year sample for the health insurance coveragel variable. However, the person id variable doesn not uniquely identify the observations, of the 9 million obs, there are 1,268 person ids with duplicate values.

I also cannot just arbitraily drop the duplicates, becuase it appears that some are located in different states.

Please advise.



When generating a unique person id by concatenating variables, it is important to include leading zeros on the variables being concatenated. One method for insuring the inclusion of leading zeros is to generate new variables that are formatted to include leading zeros. In Stata it is easy to concatenate strings so I also convert the variables to string. The Stata code that I use looks like this:

gen str4 stryear = string(year, “%04.0f”)
gen str2 strdatanum = string(datanum, “%02.0f”)
gen str8 strserial = string(serial, “%08.0f”)
gen str4 strpernum = string(pernum, “%04.0f”)

gen uniqueid = stryear + strdatanum + strserial + strpernum

Using this code I was able to uniquely identify all persons in the 2010 3yr file.

I hope this helps.



Great, thank you.