I am working with the CPS basic monthly data, limiting my sample to those in mish 4 and mish 8 as I am interested in the hourly earning data. I am using year 2003 to year 2019 at the moment. I only keep respondent between 25 and 65 years old, and for which eligorg equals 1 (so I can have the earning data). I also drop those with a wage lower than 1 dollars and higher than 100 after having adjusted for CPI.
After having done all of this, I end up with exactly a third of the sample that only have one observation (using cpsid cpsidp to identify unique respondent) and two thirds with 2 observations. I have below copied some tabulation so you can see the pattern. I am wondering if this is due to staudy sedign (which would be great) or if there is a weird attrition that I need to be aware of, because of some unobserved selection.
Furthermore, I am wondering which weights I should be using when I carry out analysis with this data (I am planning to run some OLS models ignoring the panel design, and some fixed-effect models)
Thank you for your help!
I looked at a smaller sample and found the same pattern. Unfortunately this is just due to the attrition patterns in the data. About 1/4 of those ever appearing in the ORG have data only for MISH 4, 1/4 only for MISH 8, and half have both. Since people with two ORG observations appear twice in the data, you get about a 2:1 ratio of observations for those with two ORG observations versus those with one ORG observation.
Regarding weights, most person-level analyses using ORG data should use EARNWT.
Thank you for your help!
Would you use earnwt even if I am using the longitudinal panel?
The weights included in the original CPS microdata are not designed to be used in panel analysis, which is why IPUMS developed the longitudinal weights, such as LNKFW1YWT. These weights adjust the base weight (WTFINL) to account for attrition between two waves of the CPS survey. Which weights to use depends on the specific analysis, but it seems to me that for a fixed effect model (keeping only individuals with 2 ORG observations) you might want to use EARNWT (from the first year) as the base weight and adjust it for attrition, as is done for the IPUMS longitudinal weights. There is example Stata code for replicating the creation of the IPUMS longitudinal weights at this page, which could be modified to use EARNWT as the base.
This presentation (see pages 11-12 especially) has a nice discussion of the issues involved in weighting longitudinal data from complex sample surveys.