unique person id variable?

I created a person id variable by concatenating the year, datanum, serial, and pernum. I am using the 2010 3 year sample for the health insurance coveragel variable. However, the person id variable doesn not uniquely identify the observations, of the 9 million obs, there are 1,268 person ids with duplicate values.

I also cannot just arbitraily drop the duplicates, becuase it appears that some are located in different states.

Please advise.

When generating a unique person id by concatenating variables, it is important to include leading zeros on the variables being concatenated. One method for insuring the inclusion of leading zeros is to generate new variables that are formatted to include leading zeros. In Stata it is easy to concatenate strings so I also convert the variables to string. The Stata code that I use looks like this:

gen str4 stryear = string(year, “%04.0f”)
gen str2 strdatanum = string(datanum, “%02.0f”)
gen str8 strserial = string(serial, “%08.0f”)
gen str4 strpernum = string(pernum, “%04.0f”)

gen uniqueid = stryear + strdatanum + strserial + strpernum

Using this code I was able to uniquely identify all persons in the 2010 3yr file.

I hope this helps.

Great, thank you.

Is there a more up-to-date procedure for ACS 2021 5yr? Looks like SAMPLE has replaced DATANUM.

YEAR and SAMPLE (previously known as DATANUM) are no longer needed to create a unique ID for multiyear ACS samples because in the multiyear samples, SERIAL is unique across years. Note, if you are appending single-year ACS samples together to create a multiyear file (e.g., 2019 ACS 1-year + 2020 ACS 1-year + 2021 ACS 1-year), you will need to include YEAR or SAMPLE along with SERIAL and PERNUM to create a unique ID.

Here is the updated code to produce a unique ID in Stata:

gen str8 strserial=string(serial,“%08.0f”)
gen str4 strpernum = string(pernum, “%04.0f”)
gen uniqueid = strserial + strpernum

And here is the duplicates report of the uniqueid for the 2021 ACS 5-year sample:

duplicates report uniqueid

Duplicates in terms of uniqueid

Copies | Observations Surplus
----------±--------------------------
1 | 15537785 0