I am using R to analyze CPS data on household income and would like to use the replicate weights to create standard errors.
I am aware that such a code exists in STATA and other statistical software but am having issues translating this to R.
I am using R to analyze CPS data on household income and would like to use the replicate weights to create standard errors.
I am aware that such a code exists in STATA and other statistical software but am having issues translating this to R.
Post edited 12/23/2020 to correct a typo
Note that the CPS weighting system has changed a little bit this year, and not all of our documentation has been updated. There used to be just one variable name across all supplements (WTSUPP), but now the variable name depends on which supplement you are using. Here are some examples using ASEC data (and so use ASECWT), but you can see the chart here to see what variable you should use:
https://cps.ipums.org/cps/weights_ren…
And here are some examples using first the survey package, and then the srvyr (which is based on survey, but uses dplyr syntax).
library(ipumsr)
library(dplyr)
data <- read_ipums_micro(“cps_00021.xml”)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command ipums_conditions()
for more details.
data <- data %>%
mutate(
AGE = as.numeric(AGE),
SEX = as_factor(SEX),
INCTOT = as.numeric(lbl_na_if(INCTOT, ~.val >= 99999990))
)
library(survey)
svy <- svrepdesign(data = data, weight = ~ASECWT, repweights = “REPWTP[0-9]+”, type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)
svymean(~INCTOT, svy, na.rm = TRUE)
#> mean SE
#> INCTOT 42526 383.64
svy_subset <- subset(svy, AGE >=25 & AGE < 65)
svymean(~INCTOT, svy_subset, na.rm = TRUE)
#> mean SE
#> INCTOT 51407 496.95
svyby(~INCTOT, ~SEX, svy, svymean, na.rm = TRUE)
#> SEX INCTOT se
#> Male Male 53196.41 637.2199
#> Female Female 32456.95 325.3275
library(srvyr)
svy <- as_survey(data, weight = ASECWT, repweights = matches(“REPWTP[0-9]+”), type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)
svy %>%
summarize(mn = survey_mean(INCTOT, na.rm = TRUE))
#> # A tibble: 1 x 2
#> mn mn_se
#> <dbl> <dbl>
#> 1 42526. 384.
svy %>%
filter(AGE >= 25 & AGE < 65) %>%
summarize(mn = survey_mean(INCTOT, na.rm = TRUE))
#> # A tibble: 1 x 2
#> mn mn_se
#> <dbl> <dbl>
#> 1 51407. 497.
svy %>%
group_by(SEX) %>%
summarize(mn = survey_mean(INCTOT, na.rm = TRUE))
#> # A tibble: 2 x 3
#> SEX mn mn_se
#> <fct> <dbl> <dbl>
#> 1 Male 53196. 637.
#> 2 Female 32457. 325.
I’m working with the ASEC file to estimate TANF participation. I found this forum for specifying the survey design in R, but when looking here on Anthony Damico’s site on complex survey design: http://asdfree.com/current-population-survey-basic-monthly-cpsbasic.html the type and row parameters are different. Is this a mistake on Damico’s part? Should I follow this approach? Additionally are there are any reference tables to make sure estimates are correct? The ACS PUMS provide state level estimates to check to make sure your survey design is correct. Is this available for CPS?
One key difference between the approach detailed on the linked webpage and the approach noted on the IPUMS Forum above, is that the later integrates the replicate weights provided with the CPS data while the former does not. These are two distinct ways of calculating standard errors. More information about replicate weights is available here. Regarding any previously calculated statistics using the CPS data, I’ll direct you to the BLS website. They list a number of tables with published statistics using the CPS data.
Excuse me. I meant to send the ASEC link that does include the replicate weights. I just want to make sure the approach linked in the above post is appropriate for calculating proper standard errors with the survey package? And I was curious if there are any reference files to double check the estimates like one can do with the ACS values?
Yes, the R packages survey and srvyr can help facilitate specification of sample design with the ipumsr package. Regarding any previously calculated statistics using the CPS data, I’ll direct you to the BLS website. They list a number of tables with published statistics using the CPS data.
This is very helpful, but in the call to svrepdesign/as_survey, shouldn’t the value of scale be 4/160 instead of 4/60? According to IPUMS CPS, the multiplier in front of the sum of squared deviations in the formula for the standard error is 4/160, not 4/60. Am I missing something?
Do you have any similar sample code for using replicate waits within these packages for IPUMS USA (ACS files)?
You’re correct that there was a typo in the earlier post. I have edited that post to use the correct denominator of 160 in the survey design specification step. Thanks for pointing this out, and sorry for the year-late follow up, your post must have slipped through the cracks!
The code for ACS samples should be nearly the same. The IPUMS USA page on replicate weights gives details on the calculations: https://usa.ipums.org/usa/repwt.shtml
I’ll refer you to the earlier post in this thread by @gfellis giving example code for using replicate weights with ASEC data. Apart from specific variables used in analysis, the code will work more or less unchanged for ACS, with a few modifications to the survey design specification. Below I’ve highlighted the things you would need to change, depending on whether you’re using -survey- or -srvyr- packages:
svy <- svrepdesign(data = data, weight = ~ASECWT, repweights = “REPWTP[0-9]+”, type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)
svy <- svrepdesign(data = data, weight = ~PERWT , repweights = “REPWTP[0-9]+”, type = “JK1”, scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)
svy <- as_survey(data, weight = ASECWT, repweights = matches(“REPWTP[0-9]+”), type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)
svy <- as_survey(data, weight = PERWT , repweights = matches(“REPWTP[0-9]+”), type = “JK1”, scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)
@Matthew_Bombyk (or other IPUMS Staff :D)
If I can revive an old thread… Can I confirm that I am repurposing the code correctly for weighting at the household level? For instance, if my goal is to aggregate data at the household level for CPS ASEC, I should be using:
-survey package-
svy <- svrepdesign(data = data, weight = ~ ASECWTH, repweights = “ REPWT[0-9]+”,
type = “JK1”, scale = 4/160, rscales = rep(1, 160), mse = TRUE)
Of note is using REPWT over REPWTP, and using ASECWTH over ASECWT. My understanding is we don’t need to change the parameters in type, scale, or scales?
Thanks so much!
That looks right to me.
Awesome. Thanks @Matthew_Bombyk .
I tried to follow exactly the steps explained above, but always get this error:
Error in if (combined.weights & probably.not.combined.weights) warning(paste(“Data do not look like combined weights: mean replication weight is”, :
missing value where TRUE/FALSE needed
does anybody have an idea what’s wrong ?
Many thanks in advance !
This may be a problem with your -survey- package. I recommend installing the latest version of the survey package. You can type:
install.packages("survey")
If you’re using RStudio, try this in base R first and see if the problem is fixed.
Thanks for this code and instruction, it is very helpful.
I am trying to use CPS data for descriptive statistics on SSI recipients in California and their rates of SNAP receipt in particular.
I am trying to use the ASECWT for my code in Rstudio, however when I use the syntax you provided after installing the survey package, I am receiving this error messages I can’t clear:
Error in UseMethod(“as_survey”) :
no applicable method for ‘as_survey’ applied to an object of class “function”
Any advice?
Thanks in advance,
Katie
It appears that RStudio is trying to apply the UseMethod function to the as_survey function, however this is not possible since as_survey is also a function. Would you be able to share where you are getting code to use replicate weights? IPUMS CPS provides this code on the replicate weights user guide. In RStudio, you first want to install the srvyr package:
install.packages(“srvyr”)
library(“srvyr”)
Then, run the as_survey function:
svy ← as_survey(data, weight = ASECWT, repweights = matches(“REPWTP[1-160]+”))