Q for Tim Moreland Re: Trouble merging NBER and IPUMS

ccrawford511 · December 18, 2014, 5:48pm

Hi Tim-

This is a follow up question to my original re: merging NBER and IPUMS Jan/Feb Job Tenure data with its corresponding March supplements. I have been following the links that you shared, and it has been working well for the most part. However, for 2000 and 2002, h_idnum h_hhnum a_lineno a_sex and a_age have not been uniquely identifying observations in the NBER March dataset. So even though I can succesfully sequentially merge it (March NBER dataset) with the IPUMS March dataset, I am unable to merge that master March IPUMS/NBER dataset with the January counterpart, because the variables listed above do not uniquely identify persons. Even if I add statefip I am not able to get anywhere. To move things forward I have just been dropping the duplicate records, which has been around 400-500 for each year. But this makes me wonder if something is going amiss, and ideally I would retain as many potential matches as possible.

Thank you again for any help you can provide.

All the best,

Caroline

Tim_Moreland · December 18, 2014, 8:15pm

Using the raw NBER March supplement dataset for 2000, I am unable to replicate your issue. I see 133,710 observations in the raw dataset, with a corresponding 133,710 unique combinations of h_idnum, h_hhnum, a_lineno, a_sex, and a_age. I would recommend double-checking that your raw dataset is beginning with the correct number of observations to make sure there were no problems downloading the NBER dataset (see this sample size table). If you are still seeing duplicates in your file, please email ipums@umn.edu with the identifier values for a couple examples of duplicates.

Hope this helps.

ccrawford511 · December 18, 2014, 10:03pm

Hi Tim–

Thank you for getting back to me. I have sent an email to the IPUMS email address as you directed, but thought that I would share this more specific information with you in case it’d help. A specific example of where this is happening would be NBER’s March 2002 supplement. The variables that are supposed to uniquely identify observations (h_idnum h_hhnum a_lineno a_sex a_age) have not been doing so. The IPUMS March 2002 dataset has the same total number of observations (217,219) as the NBER March 2002 dataset. I checked online and ensured that this is the correct March 2002 sample size. Because of the common sample size I am able to sequentially merge the March 2002 NBER dataset with the March 2002 IPUMS dataset. But this isn’t very useful as I am unable to use this newly created master March IPUMS/NBER to match with its Jan/Feb counterpart without the unique identifiers.

An example of duplicate identifiers is the following from the March 2002 NBER dataset: h_idnum (200068101345321) h_hhnum (1) age (9) sex (male) lineno (3).

I have found variables such as “h_seq” that allow me to uniquely identify observations within the March 2002 Supplement, but this is not very useful either as h_seq is not a variable that is shared by the Jan/Feb 2002 Supplement.

Thank you again for your response and support in this process.

All the best,

Caroline

Topic		Replies	Views
Linking NBER 2012 November Supplement to IPUMS CPS November 2012 Data Extract CPS	2	561	January 2, 2015
Merge monthly IPUMS CPS with NBER ORG CPS	3	1172	September 17, 2021
Matching IPUMS-CPS monthly files	1	620	November 14, 2014
HRHHID in 1976-77 in IPUMS and NBER CPS	2	340	May 1, 2020
Should the IPUMS-CPS and NBER CPS data match? CPS	2	771	November 30, 2015

Q for Tim Moreland Re: Trouble merging NBER and IPUMS

Related topics