How to generate multi-year estimates with variance estimation

I am trying to generate custom multi-year estimates while also calculating the standard error. For example, find 3-year estimates by combining three 1-year samples together. The guidance I’ve received from the Census bureau is to divide the weights by 3 as well as all of the replicate weights by 3. However, when I input this into my survey design in R using the survey and srvyr packages, my estimate seems correct but my standard errors are far too high.

Does anyone have any experience generating multi-year estimates and calculating the standard errors for them?

If you are combining three ACS samples together, it is typically advisable to use the 3-year files provided by the Census Bureau. These files, available via IPUMS USA, include sampling weights that are already adjusted for the pooling together of multiple single-year files. Additionally, replicate weights are available in these files. If you are not combining three ACS samples together, then the procedure you discuss above (e.g., dividing the sampling weights by the number of samples pooled together) is approximately correct. Strictly speaking, a “more accurate” method is to multiply the sampling weight in sample x by (the sample size in sample x) / (the pooled sample size). If the combined samples all have roughly the same sample size, then the two methods discussed will be approximately equivalent.

@JeffBloem Thanks for the reply. The 3-year data was discontinued in 2012, so to balance currency of data with sample size, I am trying to combine three 1-year estimates.

I was concerned with using the replicate weights, but I think I just solved the issue by using the STRATA and CLUSTER variables. Can you answer these questions for me?

  1. Does IPUMS calculate in-house replicate weights instead of reporting Census PUMS replicate weights?
  2. Does IPUMS offer CLUSTER and STRATA as optional methods for calculating standard error? --> And so are REPWT and REPWTP simply separate variables that an analyst could use to estimate the standard errors?
  3. Why would Census PUMS be better than IPUMS for combining three 1-year files?

Last, can you tell me if this seems correct? The results really do seem accurate (not only the estimate, but the margins of error seem appropriate when the sample sizes are cut).

  1. Take 2016, 2017 and 2018 ACS 1-year estimates (IPUMS).
  2. For household-level estimates, divide HHWT by 3† and filter for PERNUM == 1.
  3. Specify the survey design using the CLUSTER and STRATA fields as well as the revised HHWT field.

My code in R:

h <- pums_cleaned %>%
  filter(PERNUM == 1) %>%
  srvyr::as_survey_design(ids = CLUSTER,
                strata = STRATA,
                weights = HHWT_3)

result <- h %>%
  filter(YEAR %in% c(2014, 2015, 2016)) %>% 
  group_by(RACBLK) %>%
  summarize(hh = survey_total(na.rm = T),
            count = unweighted(n())) %>%
  mutate(hh_moe = hh_se * 1.645,
         hh_cv = hh_se / hh,
         hh_reliability = case_when(hh_cv > 0.4 ~ "3. Unreliabile",
                                    hh_cv <= 0.4 & hh_cv > 0.2 ~ "2. Use with caution",
                                    hh_cv < 0.2 ~ "1. Use"))

†: Your advice about dividing not by 3 but by the weighted sample size is noted—great point!

I will try to answer your questions one at a time.

(1) The replicate weights available in IPUMS USA are the same replicate weights provided by the US Census Bureau.

(2) The use of CLUSTER and STRATA are not necessary to calculate standard errors. They are available as an option for users who feel they will enhance the credibility of their estimates. This page includes much more information about variance estimation with IPUMS USA data using CLUSTER and STRATA.

(3) I’m really not sure. I think this choice ultimately comes down to personal preference. As a frequent user of IPUMS USA, I can’t easily think of a case where I would prefer to use the PUMS data directly from the Census Bureau website. If you prefer to use un-harmonized variables, you can access the source variables directly from IPUMS USA. Just select the “source variables” radio button on the top of the Select Data page.

The steps you note here seem correct to me.