For the MEPS data in the years where date “DATEDD” is available, approximately 30% of the dates are missing, while only 3% of months “DATEMM” are missing.
Please guide me to well-known strategies to handle missing dates.
The only two strategies that I can see are:
- Drop records / rows with the missing dates.
- Assign valid but (uniform) random values that match the sequence of records using the previous and next records for the same “MEPSID”.
I think you are referencing the IPUMS MEPS variables EVENTMM and EVENTDD, which correspond to the *DATEDD and *DATEMM in the original AHRQ data files; please correct me if I am wrong.
This strikes me as a function of human memory rather than a data issue (e.g., it is much easier to remember the month of a visit to the doctor’s office than a specific date; it can sometimes be difficult to find the specific visit date on insurance materials you receive afterwards as well). How to handle variables with high rates of missingness is something we leave up to the discretion of individual researchers. However, I will note that dropping all cases with missing dates may bias your sample (i.e., persons who may not have access to specific dates of events might be systematically different than those who do). I am not aware of any guidance from AHRQ on this specific subject. I encourage you to review the literature in your field for how to handle lack of specificity in dates.