Hi IPUMS Team,
I read the post about merging CPS and ATUS but encountered a huge sample loss while linking them, and I wonder why.
My research needs the teleworking variables from CPS (i.e., TELWRKDIFFCVD, TELWRKPAY, TELWRKHR, and TELWRKBFCVD, available since October 2022) which are unavailable among ATUS person files. To start with, I created data extract from cross-sectional CPS 2020-2024 and rectangle ATUS 2021-2023, keeping only the latest CPS record for each individual if they have multiple ones. I chose CPS 2020-2024 because my ATUS file has the “year_cps8” value ranging from 2020 to 2023.
As the previous post suggested, I merged 1:1 the ATUS file with the CPSIDP variable using the CPS file, but around 8,548 samples out of the total 25,771 ATUS samples were not matched at all. This is wired because all my ATUS records have their PRESENCE value as “both atus and cps” and should all have corresponding ATUS records. What’s worse, all samples interviewed by CPS later than October 2022 were unmatched, and all matched samples came from September 2020 to September 2022. This was terrible, since my teleworking variables were only available after Oct 2022, and all matched samples were actually useless for me.
I pasted my STATA codes here just in case. Can you tell me what’s wrong here?
**================ IPUMS CPS file 2020-2024
use "D:\Data\American Time Use Study\Linkable CPS 2020-2024\cps_00016.dta", clear
order cpsidp mish
format cpsidp %20.2g
gsort cpsidp -mish
by cpsidp, sort: gen keep_flag = (_n == 1)
forvalues i=1(1)3{
by cpsidp: replace telwrkbfcvd = telwrkbfcvd[_n+`i'] if missing(telwrkbfcvd)
by cpsidp: replace telwrkdiffcvd = telwrkdiffcvd[_n+`i'] if missing(telwrkdiffcvd)
}
keep if keep_flag == 1
ren telwrkpay telework
la var telework "telework 0/1 cps"
ren telwrkhr twhour
la var twhour "telework hours cps"
drop keep_flag
ren year year_cps8
ren month month_cps8
contract cpsidp mish year_cps8 month_cps8 telework twhour telwrkbfcvd telwrkdiffcvd
save "D:\PhD\Research\Teleworking Lose Community\Data\linkable_cps 2020-2024.dta", replace
**================ link files
use "D:\Data\American Time Use Study\IPUMS ATUS 2021-2023\atus_00006_person.dta", clear
format cpsidp caseid %20.2g
merge 1:1 cpsidp using "D:\PhD\Research\Teleworking Lose Community\Data\linkable_cps 2020-2024.dta", keep(match master)
tab month_cps8 year_cps8 if _merge==1
tab month_cps8 year_cps8 if _merge==3