Why is the concatenation of YEAR, SERIAL, and PERNUM not unique?

bgall · July 6, 2017, 8:03am

It is my understanding that these three variable should produce unique observations for each person. I combined these and found that 98 people have identical values on these three variables as someone else. Why does this arise? I am looking at all observations from all Voter Supplement data.

JeffBloem · July 6, 2017, 3:56pm

It seems what is causing these duplicates is the software is dropping the leading zeros on SERIAL and PERNUM. If you add these back in (SERIAL should be 5 digits and PERNUM should be 2), and redo the concatenation, you should generate a unique id across your samples. Additionally, CPSIDP is a read-made variable that, when concatinated with YEAR and MONTH, uniquely identifies people across IPUMS CPS samples. In your data set of all Voter Supplements, however, CPSIDP uniquely identifies all people because you are only including one month per year.

bgall · July 6, 2017, 6:13pm

That was correct, good catch. Thank you.

Topic		Replies	Views
unique person id variable? USA	4	785	October 24, 2023
Creating unique identifier CPS	1	1122	February 20, 2017
Data appears to be scrambled CPS	3	338	September 3, 2020
Unique household/person identifiers for matching basic monthly CPS data longitudinally CPS	1	2554	August 1, 2014
How does IPUMS CPS variable SERIAL behave across years and downloads? CPS	2	813	March 31, 2015

Why is the concatenation of YEAR, SERIAL, and PERNUM not unique?

Related topics