 # How can I use replicate weights to create standard errors in R?

I am using R to analyze CPS data on household income and would like to use the replicate weights to create standard errors.

I am aware that such a code exists in STATA and other statistical software but am having issues translating this to R.

1 Like

Post edited 12/23/2020 to correct a typo

Note that the CPS weighting system has changed a little bit this year, and not all of our documentation has been updated. There used to be just one variable name across all supplements (WTSUPP), but now the variable name depends on which supplement you are using. Here are some examples using ASEC data (and so use ASECWT), but you can see the chart here to see what variable you should use:

https://cps.ipums.org/cps/weights_ren…

And here are some examples using first the survey package, and then the srvyr (which is based on survey, but uses dplyr syntax).

library(ipumsr)

library(dplyr)

# Read data and some light data formatting

#> Use of data from IPUMS-CPS is subject to conditions including that users should

#> cite the data appropriately. Use command `ipums_conditions()` for more details.

data <- data %>%

mutate(

AGE = as.numeric(AGE),

SEX = as_factor(SEX),

INCTOT = as.numeric(lbl_na_if(INCTOT, ~.val >= 99999990))

)

# If not installed already: install.packages(“survey”)

library(survey)

svy <- svrepdesign(data = data, weight = ~ASECWT, repweights = “REPWTP[0-9]+”, type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)

# Calculate mean of INCTOT

svymean(~INCTOT, svy, na.rm = TRUE)

#> mean SE

#> INCTOT 42526 383.64

# Calculate a mean of INCTOT, on the subset of people aged 25-64

svy_subset <- subset(svy, AGE >=25 & AGE < 65)

svymean(~INCTOT, svy_subset, na.rm = TRUE)

#> mean SE

#> INCTOT 51407 496.95

# Calculate the mean of INCTOT by SEX

svyby(~INCTOT, ~SEX, svy, svymean, na.rm = TRUE)

#> SEX INCTOT se

#> Male Male 53196.41 637.2199

#> Female Female 32456.95 325.3275

# If not installed already: install.packages(“srvyr”)

library(srvyr)

svy <- as_survey(data, weight = ASECWT, repweights = matches(“REPWTP[0-9]+”), type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)

# Calculate mean of INCTOT

svy %>%

summarize(mn = survey_mean(INCTOT, na.rm = TRUE))

#> # A tibble: 1 x 2

#> mn mn_se

#> <dbl> <dbl>

#> 1 42526. 384.

# Calculate a mean of INCTOT, on the subset of people aged 25-64

svy %>%

filter(AGE >= 25 & AGE < 65) %>%

summarize(mn = survey_mean(INCTOT, na.rm = TRUE))

#> # A tibble: 1 x 2

#> mn mn_se

#> <dbl> <dbl>

#> 1 51407. 497.

# Calculate the mean of INCTOT by SEX

svy %>%

group_by(SEX) %>%

summarize(mn = survey_mean(INCTOT, na.rm = TRUE))

#> # A tibble: 2 x 3

#> SEX mn mn_se

#> <fct> <dbl> <dbl>

#> 1 Male 53196. 637.

#> 2 Female 32457. 325.

I’m working with the ASEC file to estimate TANF participation. I found this forum for specifying the survey design in R, but when looking here on Anthony Damico’s site on complex survey design: http://asdfree.com/current-population-survey-basic-monthly-cpsbasic.html the type and row parameters are different. Is this a mistake on Damico’s part? Should I follow this approach? Additionally are there are any reference tables to make sure estimates are correct? The ACS PUMS provide state level estimates to check to make sure your survey design is correct. Is this available for CPS?

One key difference between the approach detailed on the linked webpage and the approach noted on the IPUMS Forum above, is that the later integrates the replicate weights provided with the CPS data while the former does not. These are two distinct ways of calculating standard errors. More information about replicate weights is available here. Regarding any previously calculated statistics using the CPS data, I’ll direct you to the BLS website. They list a number of tables with published statistics using the CPS data.

Excuse me. I meant to send the ASEC link that does include the replicate weights. I just want to make sure the approach linked in the above post is appropriate for calculating proper standard errors with the survey package? And I was curious if there are any reference files to double check the estimates like one can do with the ACS values?

Yes, the R packages survey and srvyr can help facilitate specification of sample design with the ipumsr package. Regarding any previously calculated statistics using the CPS data, I’ll direct you to the BLS website. They list a number of tables with published statistics using the CPS data.

This is very helpful, but in the call to svrepdesign/as_survey, shouldn’t the value of scale be 4/160 instead of 4/60? According to https://cps.ipums.org/cps/repwt.shtml, the multiplier in front of the sum of squared deviations in the formula for the standard error is 4/160, not 4/60. Am I missing something?

Do you have any similar sample code for using replicate waits within these packages for IPUMS USA (ACS files)?

You’re correct that there was a typo in the earlier post. I have edited that post to use the correct denominator of 160 in the survey design specification step. Thanks for pointing this out, and sorry for the year-late follow up, your post must have slipped through the cracks!

The code for ACS samples should be nearly the same. The IPUMS USA page on replicate weights gives details on the calculations: https://usa.ipums.org/usa/repwt.shtml

I’ll refer you to the earlier post in this thread by @gfellis giving example code for using replicate weights with ASEC data. Apart from specific variables used in analysis, the code will work more or less unchanged for ACS, with a few modifications to the survey design specification. Below I’ve highlighted the things you would need to change, depending on whether you’re using -survey- or -srvyr- packages:

## Using -survey-

### ASEC:

```svy <- svrepdesign(data = data, weight = ~ASECWT, repweights = “REPWTP[0-9]+”,
type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)```

### ACS:

```svy <- svrepdesign(data = data, weight = ~PERWT , repweights = “REPWTP[0-9]+”,
type = “JK1”, scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)```

## Using -srvyr-

### ASEC:

```svy <- as_survey(data, weight = ASECWT, repweights = matches(“REPWTP[0-9]+”),
type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)```

### ACS:

```svy <- as_survey(data, weight = PERWT , repweights = matches(“REPWTP[0-9]+”),
type = “JK1”, scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)```