Subpopulation estimates, college enrollment

I’m studying trends in college student employment using CPS data and STATA between the years 1995-2018. To do so, I have used the svyset, subpop() commands in STATA and the weight, edsuppwt. The resulting subpopulation estimates of college enrollments are smaller than what I would expect given other data sources from the U.S. Dept. of Education. For example, my estimate of the number of (full & part-time) undergraduate students using the CPS data is much closer to the number of full-time undergraduate students reported by the DOE.

My questions:

  1. Have others also found that CPS underestimates college enrollment? I seem to recall this being possible because of the CPS focus on non-institutionalized populations, but don’t want to discount the possibility of an error on my end.

  2. Would reweighting be a viable (or necessary) option? I am particularly concerned about interpreting employment trends, etc if I’m unable to recover relatively accurate population estimates but am new to survey analysis. Thanks!

Can you post a comparison of the CPS and specific other numbers that you’re trying to replicate? Then I can try to replicate your findings and may be able to help you resolve the discrepancy. If you prefer, you can correspond with the IPUMS User Support Team by email at

Matthew, thank you! Just sent an email over.

College students are an exceptionally hard to reach population. Their living arrangements (group quarters) are difficult for field interviewers to penetrate and list properly. Students are in and out of their places of residence at weird times. There is little way to mail stuff to them, and they don’t pick up their phones. They are hard to recruit. Long governments surveys are too boring for them; they are hard to engage. So I am not surprised that CPS is missing them. (On a positive side, they should not have any cognitive or language difficulties.) That generic problem is difficult to overcome; CPS weights the data to the population, but I doubt that detailed projections are available for the narrow age range 18-25-ish.

If you have detailed enough data from DoE that you trust better, there may be scope for reweighting the CPS subpopulation of college students to these numbers, with an understanding that this is an artisan data product for your internal consumption only, given the exceptional circumstances. (As a weighting statistician, I don’t say this lightly.) These weights should not be considered better than the CPS weights, but rather custom weights for your analysis.

P.S. Please don’t yell Stata, it is not an abbreviation.