Does anyone have sample code for using svydesign function in R?


#1

In the “survey” vignette for R, it shows that you’re supposed to enter ID, weights, data set name, and fpc. In the example it shows the following:

dclus1 <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)

FPC is supposed to be the population size, but I do not have this variable in my dataset. What variables should I be using for id and fpc? The variables in my dataset are: year, datanum, serial, HHWT, STATEFIP, GQ, PERNUM, PERWT, AGE, HISPAN, HISPAND, RACAMIND, RACASIAN, RACBLK, RACPACIS, RACWHT, RACOTHER, POVERTY.

Does anyone have sample R code for weighting IPUMS USA data? I would like to get estimates and standard errors.


What R package is needed for IPUMS USA for weighting? Is it "survey"? Do you have sample code?
#2

Hi there,

The fpc argument is not required, so you can leave it empty. (The FPC argument can give extra precision when you know that the sample design has sampled a significant portion of a particular group. This isn’t the case for IPUMS USA, so the estimates would be indistinguishable with or without the FPC and the data doesn’t have the necessary information to use it. See the help for svydesign or Thomas Lumley’s book “Complex Surveys: A guide to Analysis Using R” for more details.)

Based on the variables you’ve listed, I believe you will need to revise your extract to add the CLUSTER and STRATA variables, and then the following code should give you estimates using the person weights. If you are interested in weighting at the household level, you’ll need to use HHWT instead of PERWT.

library(ipumsr)

ddi <- read_ipums_ddi(“usa_00019.xml”)

data <- read_ipums_micro(ddi)

survey package instructions (Person Weights)

library(survey)

svy <- svydesign(~CLUSTER, weights = ~PERWT, strata = ~STRATA, data = data, nest = TRUE, check.strata = FALSE)

svymean(~HISPAN, svy)

srvyr package instructions (Person Weights)

library(srvyr)

svy <- as_survey(data, ids = CLUSTER, probs = PERWT, strata = STRATA, nest = TRUE)

summarize(svy, HISPAN = survey_mean(HISPAN))