I have read a couple of questions about merging IPUMS-CPS data with NBER-CPS data, motivated by additional variables being available in the NBER data sets. I was unaware that the IPUMS data did not include all the CPS data. Might you be able to characterize the variables present in the NBER data sets but not in the IPUMS-CPS data? I’d like to get some sort of handle on whether there are variables there that I might need or want.
There isn’t a specific type of variable or subject that encapsulates the set of variables that are currently unavailable via IPUMS CPS, so it is difficult to summarize. IPUMS CPS started by integrating all of the ASEC files, so these have the greatest representation overall. Supplements also have most variables integrated, and in general the IPUMS CPS team tries to integrate variables that are in multiple years of data. That being said, IPUMS CPS is adding new variables all the time and hopes to offer the full collection of public CPS variables eventually.
I’m sorry I couldn’t give a more specific answer, but I hope this helps.
Do you have, e.g., counts of variables you have imported and those you haven’t? Or, I presume there is some IPUMS internal document that maintains a list of non-imported variables, that you use to prioritize which variables will be added next. Could that be posted?
We do not currently have a count or list of variables we have/have not imported from the CPS nor do we maintain a comparison of what is in IPUMS CPS versus NBER (others have also been curious about this). One of the challenges is the sheer number of variables that exist and the fact that there have been a lot of changes over time. There are a number if variables available in the public CPS files that represent intermediate recodes as well as ‘Anything else’ type information that is then captured in a summary variable that IPUMS CPS provides in harmonized form. So, IPUMS CPS does not directly provide a harmonized version of the variable, its information is represent in the IPUMS CPS collection. In terms of numbers of variables, I would estimate that IPUMS CPS provides harmonized versions of about 75% of all public CPS variables. However, as I mentioned, there is a current project underway to provide all CPS variables in unharmonized form, even when harmonized counterparts are unavailable.