Calculating Standard Errors Using CPS Basic Monthly Microdata

Hi there. I am attempting to use the CPS basic monthly survey to examine employment trends by occupation. When calculating standard errors for monthly employment by occupation, I tried using the BLS’s Generalized Variance Formulas and borrowed parameters from the broad SOC categories they provide parameters for. Unfortunately, the calculated standard errors were much larger than is reasonable (in many cases 30-50 times larger than the estimate), presumably because the GVF factors I borrowed were are supposed to be used on broader occupational categories with many more observations.

Is there a standard practice for calculating these standard errors or a clear way I should proceed?

Following CPS Technical Paper 77 (chapter 2-4), we typically recommend using replicate weights to estimate variances with CPS microdata (see our detailed CPS replicate weight user guide). However, replicate weights are only available for CPS supplements and are not provided for monthly BMS data. Since you reference borrowing GVF factors for variance estimation, I assume that you have reviewed the BLS instructions for calculating standard errors and confidence intervals. This guide states that “when considering multiple series to borrow from, using the 𝛼 and 𝛽 parameters that generate the highest standard error is generally advised”, though I understand that having standard errors that are 30-50 times larger than the estimates may not be particularly helpful.

While the PSU and strata sample design parameters are not released publicly, Davern et al. (2007) showed that specifying the lowest level of identifiable geography (sequentially as INDIVIDCC, COUNTY, METFIPS, and STATEFIP) as the strata, and household SERIAL (only unique to each household in a given survey month and year) as the cluster, performed reasonably well at estimating standard errors when compared to using the internal sample design data.

1 Like

Hi Ivan,

I’m not sure if you’re able to answer these types of questions, but in case you are:

  1. Davern et al.’s approach focuses on estimating standard errors for ASEC samples. Would you also expect the approach to provide approximate standard errors for non-ASEC samples, like ORG samples?

  2. I’m conducting one analysis with pooled data from the 1962-2025 ASEC samples, and another analysis with pooled data from the 1982-2025 ORG samples. However, the individcc, county, and metfips codes are inconsistent across those years. Thus, might it be a reasonable approach to specify “statefip” as the strata variable (and “serial” as the PSU), rather than using Davern’s lowest geography indicator for the strata? If not, do you have any alternative suggestions, preferably conservative ones?

Thanks for your time!

I am not an expert on this, but from my knowledge of the CPS sampling methodology nothing specific comes to my mind that would cause Davern et al.'s approach to estimating standard errors using a survey design-based estimator (with a stratum and clustering variable) to not work for non-ASEC samples. With that said, my recommendation is to consult with colleagues or review the literature for guidance on where this approach has been tried. Something to consider is that monthly samples are generally smaller than the ASEC and are therefore likely to have greater variance with both the internal and the public use data.

Should you choose to proceed with this approach, note that selecting a coarser final stratum, such as the state, does work around changing codes for metro areas but will likely tradeoff for less accurate standard errors. IPUMS CPS just recently released the variable PLACEFIPS, which provides consistent coding of central and principal cities from September 1995-onwards. Additionally, PLACECENSUS provides consistent coding for samples from October 1985 - May 1995 (metropolitan areas are not identified in June - August 1995). Bridging only two or three different coding systems might hopefully be much easier than the larger number of vintages in METFIPS.

Please be aware that SERIAL is only unique within a sample month. A combination of YEAR, MONTH, and SERIAL provides a unique identifier for every household in IPUMS CPS. The exception to this are households in the March BMS samples, which also appear in the ASEC samples.

Hi Ivan, thanks so much for the detailed reply - this is really helpful!