Q for Tim Moreland Re: Trouble merging NBER and IPUMS

Hi Tim-

This is a follow up question to my original re: merging NBER and IPUMS Jan/Feb Job Tenure data with its corresponding March supplements. I have been following the links that you shared, and it has been working well for the most part. However, for 2000 and 2002, h_idnum h_hhnum a_lineno a_sex and a_age have not been uniquely identifying observations in the NBER March dataset. So even though I can succesfully sequentially merge it (March NBER dataset) with the IPUMS March dataset, I am unable to merge that master March IPUMS/NBER dataset with the January counterpart, because the variables listed above do not uniquely identify persons. Even if I add statefip I am not able to get anywhere. To move things forward I have just been dropping the duplicate records, which has been around 400-500 for each year. But this makes me wonder if something is going amiss, and ideally I would retain as many potential matches as possible.

Thank you again for any help you can provide.

All the best,

Caroline

Using the raw NBER March supplement dataset for 2000, I am unable to replicate your issue. I see 133,710 observations in the raw dataset, with a corresponding 133,710 unique combinations of h_idnum, h_hhnum, a_lineno, a_sex, and a_age. I would recommend double-checking that your raw dataset is beginning with the correct number of observations to make sure there were no problems downloading the NBER dataset (see this sample size table). If you are still seeing duplicates in your file, please email ipums@umn.edu with the identifier values for a couple examples of duplicates.

Hope this helps.

Hi Tim–

Thank you for getting back to me. I have sent an email to the IPUMS email address as you directed, but thought that I would share this more specific information with you in case it’d help. A specific example of where this is happening would be NBER’s March 2002 supplement. The variables that are supposed to uniquely identify observations (h_idnum h_hhnum a_lineno a_sex a_age) have not been doing so. The IPUMS March 2002 dataset has the same total number of observations (217,219) as the NBER March 2002 dataset. I checked online and ensured that this is the correct March 2002 sample size. Because of the common sample size I am able to sequentially merge the March 2002 NBER dataset with the March 2002 IPUMS dataset. But this isn’t very useful as I am unable to use this newly created master March IPUMS/NBER to match with its Jan/Feb counterpart without the unique identifiers.

An example of duplicate identifiers is the following from the March 2002 NBER dataset: h_idnum (200068101345321) h_hhnum (1) age (9) sex (male) lineno (3).

I have found variables such as “h_seq” that allow me to uniquely identify observations within the March 2002 Supplement, but this is not very useful either as h_seq is not a variable that is shared by the Jan/Feb 2002 Supplement.

Thank you again for your response and support in this process.

All the best,

Caroline