I want to estimate voting rates by county using the CPS Voter supplement. Is is possible to pool together different voting supplement years together to help increase the N? For example, can I pool together the 2018 and 2014 voting supplements together? If so, would I use the weights in the same way? Quite apart from that, is doing this reasonable? I’m wary of pooling together presidential + non-presidential election years together since turnout rates vary so much, so the best thing I could think of is just to pool two midterm or two presidential year supplements together. Any advice would be much appreciated!
It is reasonable to pool together multiple basic monthly samples from the CPS to gain the benefits of a larger number of observations. If you intend to calculate representative population statistics with this pooled sample you’ll need to adjust the sampling weight appropriately. Basically, if you pool two samples together and do not adjust the sampling weight you’ll calculate population counts that are approximately twice the size of the population. So, in general, you’ll want to divide the sampling weight by the number of samples you are pooling together.
More specifically, you will likely run into some limitations when it comes to identifying specific counties with the public use CPS data. This is because only about 45% of households reside in an identifiable county in the public use CPS data available on IPUMS CPS. Some alternatives are to (a) perform your intended analysis at the state level (using STATEFIP) or (b) apply to use the restricted use CPS data in one of the Federal Research Data Centers.
Thanks! A point of clarification, I’m interested in pooling together the Voter supplement, not one of the basic monthly samples. Do you think this is still reasonable?
This is a good point of clarification. Each of the CPS supplements (e.g., the Voter Supplement) is an additional survey instrument added onto one of the basic monthly samples. Therefore, with the exception of the ASEC samples, it is helpful to think of the CPS data primarily as a series of monthly sample surveys with additional supplements added onto these samples. So, the previous point about pooling samples together also applies to CPS supplements.
Hi Jeff!
Thanks! This is extremely helpful! Much appreciated!
In this case of pooling samples, what is the best way to calculate standard error using generalize variance parameters? Would you calculate the standard error on all the pooled years (i.e. before dividing by the number of samples), and use the parameters from the most recent year?
Thank you!
The answer likely depends on the sort of analysis you are aiming to perform with these data. In general, I’d suggest adjusting the sampling weight as described above (e.g., dividing by the number of samples you are pooling together), and applying this adjusted sampling weight in all analysis using your pooled sample. In a regression context, I’d suggest clustering your standard errors by year to account for correlated errors within each sample. With that being said, this question is likely better answered by someone in your specific field of study.