I am attempting to merge variables on voting and volunteering from the NBER November CPS Supplement with an IPUMS extract from November 2012. After searching through similar questions on this page, I decided to use the sequential merge option. I first deleted all of the observations on the NBER file that were entirely non-responses to ensure that the IPUMS and NBER file were the same size. Then, I used the 1:1 _n merge command in stata to merge the two files (note: I did not sort any variables in either the IPUMS or NBER file). However, I created two duplicate variables (based on age and hrhhid2) to test whether the merge was correct. Unfortunately, the overwhelming majority of matches seemed to be incorrect (the duplicate age and hrhhid2 variables that I created in the NBER file did not match the age and hrhhid2 variables in the IPUMS file that they were matched to). Do you know how I can correctly merge the NBER file with my November 2012 IPUMS extract? If I can’t use sequential merge, do you know the variables that I can match on? It appears that I cannot link based on hrhhid and hrhhid2 because hrhhid is not available for 2012 and without it I cannot uniquely identify observsations.
Unfortunately, the sort order of the November CPS supplement and the corresponding November IPUMS extract are not the same. As a result, merging these datasets will require an intermediate step.
First, you must sequentially merge your November IPUMS extract with the November Basic CPS file (data and command files). Your IPUMS extract does share the same sort order as the Basic CPS file; however, you will need to remove non-respondents (observations with prtage=-1) before merging. I was able to match 100% of cases (based on age) with this sequential merge.
Second, you then need to merge your IPUMS/Basic merged file to the November supplement file (data and command files), using the shared identifiers that were merged onto your extract in the previous step. Before merging, you will need to remove non-respondents from the Supplement file (again, observations with prtage=-1). You can find the full list of shared identifiers here, but the following variables will suffice for this merge: gestcen hrhhid hrhhid2 prfamnum pulineno. Using those 5 identifiers, I was able to uniquely identify all observations in both NBER datasets and thus, match all 133,427 observations from the November IPUMS extract to the November CPS supplement.
Hope this helps.
Thank you, this is extremily helpful.