How do I drop repeated records if I combine 9 months of CPS Basic Monthly Data?



I am examining unemployment in the District of Columbia across subgroups and in order to increase the sample size, I have appended CPS Basic Monthly data for January 2016 through September 2016. (I am also going to do this for Jan - Sept of 2007 to make a comparison pre-recession and post-recession).

Because I have 9 months of observations combined into one dataset for 2016 (and later plan to do the same with 2007), how can I drop repeated records/respondents who have been surveyed more than once during this period? I saw in a previous question that if you drop all cases where MISH >=5, this will drop repeated records. But does this hold true if I have 9 months of CPS data appended into one dataset? Or is there a better technique and/or a frequency weight I can use that will account for repeated records?

Also, what is the proper weighting variable to use if I am attempting to measure unemployment amongst subgroups (i.e. unemployment by race, family type, sex, age, etc.)?

Thank you very much for your time!


The CPS has a unique 4-8-4 rotating panel design which guarantees that in any given month, about 1/8 of the sample is in its first month of enumeration, 1/8 is in its second month, and so on. The MISH variable will not work quite right because of the 9 month time span you are using. The CPSID variable is designed to identify the same households over time across CPS basic monthly samples.

Regarding weighting, most users use the HWTFINL for analysis using multiple basic monthly CPS samples.