Hello, I am working on a project that is examining the causes of bottlenecks in the trucking sector. To start, I am trying to compile some summary evidence about what sectors individuals are coming from when they switch to a trucking occupation (occ==9130 or occ==9600), what sectors they are moving into when they leave a trucking occupation, and what the proportion of stayers (i.e. people staying in a trucking occupation) are. So far, I have classified non-trucking individuals as working in broad sectoral groups according to occ code groups similar to Phares and Balthrop (2021). Anyhow, long story short is that when I compare the cross sectional ASEC samples to the longitudinal ASEC samples, conditional on individuals being movers (i.e. either entering or exiting trucking) I get similar results in terms of the absolute number of respondents and the share of movers entering and exiting trucking from each sector, but I get big differences in the number and share of stayers, and I am trying to figure out why that is the case, as this could dramatically impact my empirical results. With the longitudinal samples, it is relatively straightforward to identify the stayers as I simply classify them as having a trucking occupational code in both periods using the occ_1 and occ_2 variables With the cross sectional samples, my approach was to drop all the individuals with the cpsidp code of 0 (so that I can drop the individuals in the ASEC oversample and use just the individuals that can be matched to the basic March sample and can be matched year over year) and then set the data set as a panel using the cpsidp variable as the cross sectional identifier and the mish variable as the time variable. Then I classify individuals as stayers if they have the same occ trucking code in mish= t and mish==t + 4 or in mish=t - 4 and mish=t as that should identify individuals who are sampled exactly one year apart, right? I can see why using the cross sectional sample would inflate the total number of respondents, as is includes additional people year over year as individuals are being compared in years t-1, t, and t+1, but it is unclear to me why I get roughly the same amount of movers using both samples while the number of stayers is roughly double when I use the cross sectional samples. Can somebody please explain to me why this is happening? I can provide my Stata code if that would help. Thank you for your time.
I just calculated these rates using the 2018-2019 ASEC samples both as cross-sectional and longitudinal extracts. I found exactly the same results using both methods. See the table below for what I calculated (note this is unweighted): trucker_1=1 or trucker_2=1 indicate that an individual had a trucking occupation in period 1 or 2, respectively. I think the important thing to consider is that the 2018-2019 longitudinal sample (for example) only contains individuals who were in both the 2018 and 2019 ASEC. To replicate this using the cross sectional data, you need to drop everyone with CPSIDP=0 and also everyone who has data for only one of the years . I’ve pasted an example of my code making this calculation with the cross-sectional data.
drop if cpsidp==0
sort cpsidp year
by cpsidp (year): gen obs=_n
by cpsidp (year): egen numobs=max(obs)
drop if numobs==1
gen trucker_1 = obs==1 & (occ==9130 | occ==9600)
gen trucker_2 = obs==2 & (occ==9130 | occ==9600)
by cpsidp (year): replace trucker_2 = trucker_2[2] if _n==1
tab trucker_1 trucker_2 if obs==1, m