Hi everyone, I’m a reporter writing a story based on some very narrow slices of the CPS - referring, in some cases, to the characteristics of just a few hundred thousand individuals.
I’d like to reduce statistical noise resulting from this small sample size as much as possible by pooling multiple months of data together. But I’m aware that households spend multiple months in the sample.
Would it make sense, therefore, to pool three months of CPS data and count only those households with a MISH value of 1, 4, 5 or 8? That should avoid double-counting any one household while giving me a 50% larger sample size than one month’s CPS.
I’m open to other suggestions!
My understanding from your post is that you would like to pool multiple adjacent CPS samples together and conduct a cross-sectional analysis; please correct me if that’s not the case. You can do this by applying the appropriate sampling weight and dividing the weight by the number of samples you are pooling together. Dividing the weights by the number of samples you are using ensures the estimates are an average of the entire time period you are analyzing. Applying the correct weight will make your analysis representative of the U.S. population (or the subpopulation the weight applies to, such as the population eligible for a particular CPS supplement). It is not necessary to adjust your sample to include only individuals who are observed in only one of the samples you are pooling together. Using multiple months of data together should reduce the variance of the estimate. From CPS Technical Paper 66 (page 10-15):
CPS estimates are frequently averaged over a number of months. The most commonly computed averages are (1) quarterly, which provide four estimates per year by grouping the months of the calendar year in nonoverlapping intervals of three, and (2) annual, combining all 12 months of the calendar year. Quarterly and annual averages can be computed by summing the weights for all of the months contributing to each average and dividing by the number of months involved. Averages for calculated cells, such as rates, percents, means, and medians, are computed from the averages for the component levels, not by averaging the monthly values (e.g., a quarterly average unemployment rate is computed by taking the quarterly average unemployment level as a percentage of the quarterly average labor force level, not by averaging the three monthly unemployment rates together). Although such averaging multiplies the number of interviews contributing to the resulting estimates by a factor approximately equal to the number of months involved in the average, the sampling variance for the average estimate is actually reduced by a factor substantially less than that number of months. This is primarily because the CPS rotation pattern and resulting month-to-month overlap in sample units ensure that estimates from the individual months are not independent. The reduction in sampling error associated with the averaging of CPS estimates over adjacent months was studied using 12 months of data collected beginning January 1987 (Fisher and McGuinness, 1993). That study showed that characteristics for which the month-to-month correlation is low, such as unemployment, are helped considerably by such averaging, while characteristics for which the correlation is high, such as employment, benefit less from averaging. For unemployment, variances of national estimates were reduced by about one-half for quarterly averages and about one-fifth for annual averages.
The correct weight to use depends on which samples and variables you are using and how you are using them. For cross-sectional analyses analyzing multiple basic monthly samples pooled together (e.g., estimating an average over a three-month period), you should usually use WTFINL (or HWTFINL for household-level analyses). For cross-sectional analyses using multiple ASEC samples pooled together, you should usually use ASECWT (or ASECWTH for household-level analyses). However, for longitudinal analyses (i.e., linking individuals or households across samples to measure change over time) you should use a linking weight. See this page on linking the CPS for more information. There are also special non-ASEC supplement weights that you need to use when analyzing variables from these supplements. You can see a list of supplement weights here. This page also includes general information about weights.