I am working with the CPS data from multiple years (2017-2021) and am trying to validate the sex, race, and age of respondents month-to-month. The respondent does not need to be in the sample for all 8 MISH. I am just trying to make sure I am comparing the same person from one month to the other (eg. January to February). This means that my data includes individuals who have a different number of completed MISH (some will only be in the panel twice, while others can be in it all 8 times).
I found the Validation Code for Long Files in the CPS Summer Workshop material and tried to use it for my validation. However, I noticed that the code really only works when I have data from a set number of months. For example, if I use my data and run the validation check for sex I end up with a sex_total_match variable with values ranging from 1-8. Since I do not have a set number of MISH I am observing, this means that any value 2 to 7 can be valid or invalid and so I would not be able to use the all_match code at the end of the do file. I found that if I used egen sex_total_match=min(sex_match), by(cpsidp) instead of “total” I end up with the correct variable. However, I was wondering if there was a better way to go about validating my month-to-month links when I do not have a set number of MISH I am observing.