Hi, I am currently working on a dissertation to estimate the clustering effect of green jobs in each state in the US, using ACS 2012-2022. Initially, I ran some summary statistics on the number of green jobs in each state (using PWSTATE2), and I found that all states in the US have a similar number of green jobs of 17-18%. I then suspected it was due to not assigning weights to each individual. I went back to the ACS dataset and found a PERWT variable for individual weights. Now I am wondering how should I proceed to assign the individual weights for my dataset on STATA.

Most of the data from IPUMS, including IPUMS USA, are samples. The American Community Survey (ACS), for example, is a survey of a sample of U.S. householdsâ€”not all households or individuals in the United States respond to the ACS. Sampled households are selected via a non-random process that has a complex geographic design. The sampling method also *over-samples* some types of people based on characteristics, such as race. Each person, and each household, in the ACS, therefore, has a different probability of being in the sample, and each person and household in the ACS represent a different number of people in the population. Sampling weights correct for the non-random probability of being in the ACS. When you apply sampling weights to your analysis of ACS data, you can obtain estimates of different statistics (such as averages or regression coefficients) that are representative of the entire U.S. population, rather than being representative of the non-random ACS sample.

Household weights, like HHWT, and person weights, like PERWT, work in essentially the same way described above. When you conduct data analysis of household-level variables (such as household income or the householdâ€™s geographic location), you need to use household weights. When you conduct data analysis of person-level variables (such as personal income or unemployment), you need to use person weights.

The exception to this rule is when analyzing one of the â€śflatâ€ť or unweighted IPUMS samples. Flat IPUMS USA samples include the 1% samples from 1850-1930, all samples from 1960, 1970, and 1980, the 1% unweighted samples from 1990 and 2000, the 10% 2010 sample, and any of the full count 100% census datasets. In these â€śflatâ€ť samples, the PERWT and HHWT values will be the same for everyone in the sample. So, the use of these variables is only necessary when estimating population counts of individuals who meet a given set of characteristics.

When using ACS data, replicate weights are necessary to estimate empirically derived standard errors or confidence intervals. Point estimates will be the same when using replicate weights versus using PERWT or HHWT. You can read more about what replicate weights are and how to use them here.

Stataâ€™s *svy* suite of commands can be useful when implementing weights in Stata. You can also set weights in other ways. Try typing â€śhelp [command]â€ť into the console and searching the help document for â€śweightâ€ť to see how to incorporate weights into a given command or line of code.

Thanks Isabel!

Hi Isabel, following up on the previous post, I have applied the person weights PERWT as a probability weight, using the command of svyset [pweight=perwt], as I am running an individual-level analysis on the dataset.

I would then like to run a VCE cluster by PUMA to obtain the clustered standard error for my OLS regression. However, I realised that the svy command on STATA does not allow for vce(cluster). May I know how I can go about obtaining this standard error?

Is accounting for pweight only for the survey design, the right approach to running analysis? Or should I also account for stratification and clustering?

The *svyset* command in Stata is used to declare a survey design for the dataset. Setting a cluster variable in *svyset* tells Stata that sampling for the survey was clustered at the level of that particular variable. Since sampling for the ACS does not use PUMAs as clusters, it would not be correct to set PUMA as the primary sampling unit (PSU) or secondary sampling unit (SSU) in *svyset*. The Census Bureau recommends using replicate weights to derive standard errors, and IPUMS provides sample code for survey-setting your data and applying replicate weights here. The replicate weights include all the necessary information about sample design, making it unnecessary to specify PSU, SSU, or stratum.