Calculating standard errors for specific subpopulations

I am calculating annual average unemployment rates for people with disabilities ages 25-34 using the CPS data. I am doing the calculation nationally and within states. Because the within states rates are based on relatively small samples I want to calculate standard errors and confidence intervals.

Table PF6 of the “parameters and factors for calculating standard errors” Excel file released by the BLS contains factors and parameters for people with disabilities, but not in such specific age ranges. If I used the parameters and factors for the yearly average unemployment rate for all persons with a disability 16 years and over to calculate SE would the results be even a remotely accurate approximation?

Are there any other methods for calculating SE for very specific subpopulations not captured by the BLS parameter and factors table? I am using Stata if that helps.

Update to this. I am trying to use svyset in stata with the svy: mean command to calculate standard errors. Here are the specifications:

svyset serial [iw= compwt ], strata(metfips)

When I use svy: mean on a dummy variable indicating whether an individual in the labor force is unemployed, the resulting point estimate for the 2022 overall unemployment rate matches what I get when I calculate the rate according to BLS instructions (3.64971, which matches the rounded published BLS result). The linearized std. error I get is around 0.0004 (ratio, 0.04 in percentage terms).

Is it appropriate to approximate cluster and strata variables using the respondent serial number and the metropolitan FIPS code, given the fact that official cluster and strata variables are not available?

I think you are asking for guidance on estimating standard errors in the CPS microdata. Please correct me if I am wrong.

The first set of resources you linked is for estimating standard errors from published estimates or tables, not working with microdata.

If you are using ASEC data, you should leverage the replicate weights (REPWT for household-level analyses and REPWTP for person-level analyses) to estimate standard errors for these analyses. The replicate weights page I linked in the previous sentence includes sample code.

If you are using the basic monthly CPS data, unfortunately, there are no replicate weights or sampling design variables for estimating standard errors. Davern et al. (2006, 2007) in the journal Inquiry lay out a method for improving variance estimation using only variables available in the public use microdata. They specify the lowest level of identifiable geography as the strata, and household as the PSU, and find that it improves variance estimation relative to the baseline of using just the weighted least squares variance estimator. Another suggestion is given here: Re: st: MORG data aggregation. Regarding sampling units, the best available is the household, although in the actual CPS sample design that is the third level of sampling. Note that you should use pweights instead of iweights as specified in your proposed code (see this quick summary of different weight options in Stata). Another comment on the code you shared–as per the Davern papers, you will want to specify the lowest geographic unit available as the strata; this won’t always be METFIPS as specified in your code (METFIPS will be missing for many cases). You will need to create a lowest geography variable to report the most detailed unit available for each case (e.g., metro, county, balance of state).

Thanks for the help! This makes sense.

1 Like