Duplicate individual ids of the Vietnam 2009 census?

sy2229 · September 22, 2015, 2:55pm

I’m working with 1989, 99, 2009 Vietnam data.

I’ve created household id’s and personal id’s.

gen double hhid=sample*10^8+serial

gen double pid=hhid*100+pernum

Then, I get more than 1 million observations that are not unique observations for 2009.

. bys pid: gen obs=_N

. tab obs

obs | Freq. Percent Cum.

------------±----------------------------------

1 | 17,376,097 90.63 90.63

2 | 2 0.00 90.63

3 | 1,796,643 9.37 100.00

43 | 43 0.00 100.00

------------±----------------------------------

Total | 19,172,785 100.00

. tab year if obs==3

Year | Freq. Percent Cum.

------------±----------------------------------

2009 | 1,796,643 100.00 100.00

------------±----------------------------------

Total | 1,796,643 100.00

Do you have an idea of what may be happening?

Thank you!

Tim_Moreland · September 22, 2015, 9:38pm

Your line “gen double hhid=sample*10^8+serial” does not create sufficient space to include both sample and serial in the same variable without overlap. Instead, you should multiply sample by 10^10. When I make this change to your code, I get 19,172,742 unique values for pid across the three Vietnam census samples. In other words, there are zero duplicate individual IDs.

Hope this helps.

Topic		Replies	Views
Three short questions about Vietnam's IPUMS dataset - household and district identifiers across years INTERNATIONAL	1	349	July 2, 2014
Unique household/person identifiers for matching basic monthly CPS data longitudinally CPS	1	2480	August 1, 2014
Qs re CPSID, CPSIDP, & PERNUM CPS	1	477	May 8, 2019
Weighted household count USA	3	420	March 24, 2021
Why is the concatenation of YEAR, SERIAL, and PERNUM not unique? CPS	2	426	July 6, 2017

Duplicate individual ids of the Vietnam 2009 census?

Related topics