Hi, I am currently working on a dissertation to estimate the clustering effect of green jobs in each state in the US, using ACS 2012-2022. Initially, I ran some summary statistics on the number of green jobs in each state (using PWSTATE2), and I found that all states in the US have a similar number of green jobs of 17-18%. I then suspected it was due to not assigning weights to each individual. I went back to the ACS dataset and found a PERWT variable for individual weights. Now I am wondering how should I proceed to assign the individual weights for my dataset on STATA.
Most of the data from IPUMS, including IPUMS USA, are samples. The American Community Survey (ACS), for example, is a survey of a sample of U.S. households—not all households or individuals in the United States respond to the ACS. Sampled households are selected via a non-random process that has a complex geographic design. The sampling method also over-samples some types of people based on characteristics, such as race. Each person, and each household, in the ACS, therefore, has a different probability of being in the sample, and each person and household in the ACS represent a different number of people in the population. Sampling weights correct for the non-random probability of being in the ACS. When you apply sampling weights to your analysis of ACS data, you can obtain estimates of different statistics (such as averages or regression coefficients) that are representative of the entire U.S. population, rather than being representative of the non-random ACS sample.
Household weights, like HHWT, and person weights, like PERWT, work in essentially the same way described above. When you conduct data analysis of household-level variables (such as household income or the household’s geographic location), you need to use household weights. When you conduct data analysis of person-level variables (such as personal income or unemployment), you need to use person weights.
The exception to this rule is when analyzing one of the “flat” or unweighted IPUMS samples. Flat IPUMS USA samples include the 1% samples from 1850-1930, all samples from 1960, 1970, and 1980, the 1% unweighted samples from 1990 and 2000, the 10% 2010 sample, and any of the full count 100% census datasets. In these “flat” samples, the PERWT and HHWT values will be the same for everyone in the sample. So, the use of these variables is only necessary when estimating population counts of individuals who meet a given set of characteristics.
When using ACS data, replicate weights are necessary to estimate empirically derived standard errors or confidence intervals. Point estimates will be the same when using replicate weights versus using PERWT or HHWT. You can read more about what replicate weights are and how to use them here.
Stata’s svy suite of commands can be useful when implementing weights in Stata. You can also set weights in other ways. Try typing “help [command]” into the console and searching the help document for “weight” to see how to incorporate weights into a given command or line of code.