Merging NBER and IPUMS CPS for multiyear datasets


I have read on this User Forum that IPUMS CPS does not have household or person unique identifiers that can be used to merge IPUMS CPS data to NBER CPS data. However because IPUMS and NBER data are sorted the same upon download, I can do a sequential merge as described here.

However, the sequential merge will only work if I’m merging one year to one year of March ASEC data, correct? For example, I cannot download 5 years of March data (say 2009-2013) from IPUMS and then expect that to be able to sequentially merge with NBER data over the same time period. If I want to do this sort of thing, then would be the process be something like this?:

  1. Download and sequentially merge single year ASEC data from NBER and IPUMS CPS for each of the five years.

  2. Create unique IDs for each person record by concatenating YEAR and some other variables (do you have a recommendation for these variables?)

  3. Append the five merged IPUMS/NBER datasets together to get a final 5-year dataset with both NBER and IPUMS variables.

Would this be a good way to obtain a five-year dataset with unique person-records and both IPUMS and NBER variables/data?

Thanks for any advice

If you download five years of March data from IPUMS, the resulting data extract is the same as if you downloaded each year separately and appended them one beneath the other. As long as you append the original NBER datasets in the same manner (2009 followed by 2010 followed by 2011, etc.), there should be no issue with sequentially merging five years at the same time. As always, you should verify your merge by comparing the IPUMS and NBER values of sex, age, and race after the merge.

This final merged dataset will have multiple observations per household/person, since households can appear in consecutive years. If you are interested in linking households/persons across years, please see this document for the variables necessary to link ASEC files across time.

Hope this helps.

Thank you Tim, I will follow that procedure then.

We want to pool ASEC data across years to increase power/sample size when looking at state-level estimates for subgroups. However, as you mention, if we do this, the final merged dataset will have multiple observations per household. Since we are not matching these, we want to account for these multiple observations in the estimates and SEs. Does the sample stata code for replicate weights provided by IPUMS CPS (below) account for the multiple observations per household that appear in consecutive years of pooled ASEC data?

svyset [iw=wtsupp], jkrweight(repwtp1-repwtp160, multiplier(.025)) ///

vce(jackknife) mse

If not, is there any guidance on stata code that does this?

Thanks for your advice,