I have a question regarding the “PERWT” variable in ACS. Based on the description, “PERWT is a 6-digit numeric variable which indicates how many persons in the U.S. population are represented by a given person in an IPUMS sample”. My questions are:
-
it should be a frequency weight instead of a probability weight correct?
-
I want to examine how some independent factor is affecting employment, can I use the individual-level perwt as an indication for employment? Or I can only aggregate perwt to certain levels to get an aggregated employment estimate?
Thanks!
PERWT should be treated as a probability (a.k.a. sampling) weight since it is the result of random sampling from a population without replacement. Pweights represent the probability that an observation was selected into the sample from the population. Frequency weights on the other hand are used when you have multiple identical rows in the data; they also must be integers. This OARC post is a helpful guide to weights in Stata. Which type of weight you use (pweight, fweight, etc.) in Stata will affect your confidence intervals and standard errors, but will not affect frequencies or point estimates.
I’m not exactly sure what you mean by using PERWT as an indication for employment. EMPSTAT is the variable most commonly used to determine individuals’ employment status. You can run an analysis to see how your independent factor affects EMPSTAT with data weighted using PERWT. Once you weigh estimates of employment correctly, then you will have a nationally representative estimate. If you’re using Stata, you will want to svyset your data and then run your analysis using the svy prefix. You can also set weights in other ways. Try typing “help [command]” into the console and searching the help document for “weight” to see how to incorporate weights into a given command or line of code. Note that when using ACS data, replicate weights are necessary to estimate empirically derived standard errors or confidence intervals. Sample code for Stata, R, and SAS is provided in this IPUMS user guide. In Stata, you might run:
svyset [pweight=perwt], vce(brr) brrweight(repwtp1-repwtp80) fay(.5)mse
svy: reg empstat your_independent_factor
You may also find these IPUMS data training exercises helpful. They provide sample analyses with questions, answers, and code for a number of statistical software packages.
Hi Ivan, thank you so much for your detailed response. Clarification of using PERWT as an indication for employment, I read in a paper that they use the aggregated person weights (such as aggregate at state level: collapse (rawsum) perwt, by(statefip) as the dependent variable for employment count. So my understanding is that using empstat as the dependent variable is examining how would the independent variable affect the probability of one being employed or not; and using the aggregate personal weight as an indication for employment is examining how would the independent variable (at the aggregate state level) affect the employment count among all of the employed workers. I want to know whether using probability weight to achieve this is appropriate. Hope this makes sense:
Yes, you can aggregate PERWT to produce estimates of State-level employment. Which probability weight that you use will affect the standard errors of your estimates, but will not affect the aggregate sum of PERWT by State. While estimates on the State level might include enough observations such that standard errors are very small, I would still recommend using a statistical command (such as svy) to calculate weighted summary measures.
Note that your estimates will differ from official statistics published by the Bureau of Labor Statistics. To replicate published BLS labor force estimates, you will need to download data from the Current Population Survey (IPUMS CPS) and use COMPWT to increase reliability of estimates of month-to-month changes. Finally, you may also be interested in looking at IPUMS NHGIS, which provides access to geographically aggregated ACS summary tables produced by Census Bureau on a wide range of topics (including employment).