Hi, I am working on a series of analyses using IPUMS ACS data that I subsampled with women at reproductive ages in 2019-2021. I did it by selecting cases when I extracted the data from IPUMS before downloading it.
So using this ‘women-at-reproductive age-only’ data, I first used the replication weight that IPUMS recommended. I am using Stata, so I used the svyset and svy prefix for the regressions. However, while I was working on that, several questions came to my mind.
- First, can I use the replication weight when I use subsampled data that contain only women at reproductive age observations?
- Second, using replication weight yields smaller standard errors than using person weight (PERWT) and clustered standard errors at the state level. I wonder which approach is correct.
Regarding analyses with subsamples of the data, IPUMS recommends using the subpop() option in Stata after svysetting the data. The user guide notes that users must first define the subpopulation with a dichotomous variable coded 0 for all cases that should be excluded from the analysis. The decision of which method to use to generate standard errors is ultimately up to the researcher. The Census Bureau in their design and methodology report however notes that users of the ACS PUMS files can compute the estimated variances of their statistics using one of two options: (1) the replication method using replicate weights released with the PUMS data, and (2) the design factor method. The report states that in general, the replication method produces significantly more accurate variance estimates and is strongly recommended over the design factor method whenever practical. This paper offers some additional insight into different methods of calculating standard error with IPUMS data products and how the outcomes may vary.
Hi Ivan, thank you so much for your helpful comment!
Hi, I just wanted to ask a couple of follow-up questions.
I would like to use a mixed model that includes state random intercepts. However, I have encountered an issue where replicate weights do not seem to function correctly with multilevel models in both STATA and R.
In Stata, I used svyset[pweight=perwt], vce(brr) brrweight(repwtp1-repwtp80) fay(.5)mse. Then I tried several mixed models then it returned an error message that mixed models are not supported by svy with vce(brr).
In addition, in R, I tried the package WeMix which supports weighted mixed models but it doesn’t seem to work with the replicate weights.
Is there any known method for performing multilevel modeling with the ACS replicate weights?
Replicate weights and multilevel models are different procedures that account for clustered sampling. I am not aware of any methods that allow for multilevel modeling with the ACS replicate weights. Though the Census Bureau’s “Estimating ASEC Variances with Replicate Weights” document references the CPS ASEC, I recommend reviewing it to learn more about the construction of replicate weights.