I’m working with 1989, 99, 2009 Vietnam data.
I’ve created household id’s and personal id’s.
gen double hhid=sample*10^8+serial
gen double pid=hhid*100+pernum
Then, I get more than 1 million observations that are not unique observations for 2009.
. bys pid: gen obs=_N
. tab obs
obs | Freq. Percent Cum.
------------±----------------------------------
1 | 17,376,097 90.63 90.63
2 | 2 0.00 90.63
3 | 1,796,643 9.37 100.00
43 | 43 0.00 100.00
------------±----------------------------------
Total | 19,172,785 100.00
. tab year if obs==3
Year | Freq. Percent Cum.
------------±----------------------------------
2009 | 1,796,643 100.00 100.00
------------±----------------------------------
Total | 1,796,643 100.00
Do you have an idea of what may be happening?
Thank you!