Can I pool monthly CPS data into years to look at share of population by educational attainment level?

I’m trying to compute simple % shares of each occupation (at as detailed a level as possible) by five educational attainment categories for the entire nation, for each year going back several decades. E.g. I’d like to know that 94% of Financial managers had a BA or more in 1989, 3% had an associate’s degree, 1% had a high school diploma, etc. (I made up these numbers.) Then repeat for 1990, 1991, up till 2013.

I’ve seen this analysis done before using the March Supplement at a broad occupational classification level (22 occupations), but I am wondering if I could do this for more detailed occupations (100+ occupations). My question is how I might be able to increase the sample size to make these calculations. Would it be methodologically sound to use the monthly CPS data and pool cases in each month into their respective years?

If this is plausible, general approach, there are a few considerations I’d like to ask about

  • Will I need to account for rotation groups, e.g. do I need to run the analysis on households where MISH = 4 and/or MISH = 8? This would reduce the case count, so if it’s not necessary for this purpose it would be great to use all rotation groups, but I’m unsure whether this would lead to large distortions.

  • Can I still use the WTFINL weighting variable to calculate these % shares if I pool monthly data (and if I select only one or two rotation groups)?

  • I understand the records in each month of the monthly CPS data sum to the entire population of the US (after weighting). If I were to pool the monthly cases, I’d then get 12 times the population of the US in each year. But if my aim is simply to calculate the share of workers in a particular occupation that has, say, a Bachelor’s degree, do I need to be concerned about the underlying population count in this case? (i.e. 50 out of 100 gives the same proportion as 100 out of 200, or are there methodological issues that I’m missing?)

Thanks very much for your help!

Generally speaking, that is a sound method; however, you will need to consider the issue of repeated households. I suggest reading this discussion for more information on pooling CPS data. First, I recommend running your analysis with and without repeated households (e.g. limiting to MISH=1). Then, you can compare your results to see if the repeated households are biasing your estimates.

As for the correct weight to use, you will still want to use WTFINL for non-supplement data and WTSUPP for supplement data (e.g. ASEC samples).

The different population estimates do matter within years. Since you are calculating percentages per year, you should sum the weighted counts across months to get a weighted annual percentage. Based on your example, a month in 2012 with 50 doctors out of 100 total people should receive less weight in the 2012 annual percentage of doctors than a month with 100 doctors out of 200 total people. However, weighted counts do not matter across years (e.g. 2013 vs. 2012), since you are comparing percentages.

Hope this helps.