Why is the concatenation of YEAR, SERIAL, and PERNUM not unique?


It is my understanding that these three variable should produce unique observations for each person. I combined these and found that 98 people have identical values on these three variables as someone else. Why does this arise? I am looking at all observations from all Voter Supplement data.



It seems what is causing these duplicates is the software is dropping the leading zeros on SERIAL and PERNUM. If you add these back in (SERIAL should be 5 digits and PERNUM should be 2), and redo the concatenation, you should generate a unique id across your samples. Additionally, CPSIDP is a read-made variable that, when concatinated with YEAR and MONTH, uniquely identifies people across IPUMS CPS samples. In your data set of all Voter Supplements, however, CPSIDP uniquely identifies all people because you are only including one month per year.



That was correct, good catch. Thank you.