Related to my question yesterday, I wondered if there was any way (in the March CPS data provided by IPUMS) to identify those individuals who have had wage data imputed from 1962 onwards? This is related to the following question, the answer of which implies the data is available but not in IPUMS:
The following paper suggets it is possible to drop individuals who have not responded to the wage question and had their wage imputed instead from your sample (see first paragraph of page 30):
Unfortunately, the flag variables discussed in the papers you linked to are not currently available through IPUMS-CPS. I have passed along your interest in these variables to the IPUMS-CPS team. I can’t say for sure when these variables will become available, but I know that IPUMS-CPS is always looking for input from researchers on what variables to integrate so I have no doubt that these will become available in the future.
I am unaware of any public code that achieves this, but the sequential merge is simple in Stata. Assuming you currently have your IPUMS-CPS extract loaded into Stata and have saved the corresponding NBER file in Stata format as something like “cpsYEAR_ASEC.dta”, you can perform a sequential merge with the following command:
merge 1:1 n using cpsYEAR_ASEC.dta
The “n” in the above command is an internal Stata variable that represents the order of records, so when merging on n you are merging record 1 from the master file with line one from the using file.
if I have to download the NBER data anyway, is there any advantage to still using IPUMS data - it seems like I may as well switch entirely to NBER once I go down this road (sorry I know this is a tricky question!)
if I did want to use and merge both data sets, do I need the IPUMS data in separate dta files for each year? I currently have one IPUMS dta file with multiple years and only a portion of ttotal variables availabe?
Would there be any issue with variable names being different in IPUMS and NBER e.g. the IPUMS dta file I have now is produced after running the default stata code that IPUMS provides with the raw data, which includes some renaming commands?
(1) It really depends on your analysis. IPUMS data provides some additional variables including variables that are “harmonized” across samples. NBER data includes many of the raw variables that are included in IPUMS-CPS.
(2) It may be advised to use a separate data file for each year. This is because the way the sequential merge works is it matches records in order within each sample year. I’m not sure if this would be impossible with a single data file with multiple years, but it would make me nervious.
(3) There shouldn’t be a problem with different variable names between the two files for the sequential merge. This is because none of these variable names are directly involved in the merge. However, after you’ve merged the files you may want to rename your variables to help yourself keep things strait.
Thanks Jeff. After looking at the NBER data, I realise that this is not a small undertaking. The IPUMS data is certainly very convenient in that it comes already stitched together by year, and I don’t have to run separate do files for each year. Joe mentions that IPUMS would consider add some of the variables I mentioned - data quality flags to say if earnings data had been imputed: do you have an idea of how long that would take to implement? I imagine it wouldn’t be anytime soon but thought I would check.
Unfortuantely I’m not able to say with certainty when these variables will be available. The IPUMS-CPS project releases data many times per year, so (without any promises) I’d hope these variables could be available before the end of the year.
Just to say that I’ve discuss this with the people at NBER - it doesn’t look too promising as they are missing data files for 1965 and 1970 and there is no stata code for reading the data files before 1987.
So any help you can eventually provide on this, would be hugely appreciated!
Another follow-up question on this. It appears that the UNICON CPS utilities (now discontinued of course) did include the flag for imputed wages. Is it still possible to get the unicon data anywhere? Given it extends until 2014, this would be useful for me (PhD student in a hurry!) but a quick google suggests it is no longer available. Grateful for your thoughts on this.
Unfortunately most of the UNICON utilities that are currently available have been incorporated into the IPUMS CPS system. So, I’m afraid these are not presently available.
Thanks. Unfortunately one of the variables -flag for imputed wages - that has not yet been transferred is one I need. Do you know if it is possible to get hold of the CPS utilities data that was available until 2014? For example, are there still CD/DVD’s with the data available?
It sounds like you are asking about data quality flags for the INCWAGE variable in either IPUMS USA or IPUMS CPS. If you are using IPUMS USA see here and if you are using IPUMS CPS see here, for corresponding links to the QINCWAGE variables.