TL;DR: if I have a 1% sample and a 5% sample, is it appropriate to multiply all the PERWTs in the 1% by 5 in order to make it comparable to the 5% sample?
Hello, I’m trying to compare total employment numbers between different samples (some 1%, some 5%, and some ACS 3yr) in order to see how total employment numbers in different OCC1990 categories vary over time. I’m doing this by summing up the PERWTs for each sample and then multiplying by a constant factor in order to account for the difference in the sample size (ie: multiplying 1% samples by 100, 5% samples 20, ACS 3yr samples by 100/3)
But I’m finding that even with that the total PERWT sums vary by an absolutely huge amount (sometimes by a factor of 8 or more after standardization), so I wanted to ask if this is the proper way to standardize between samples?
The 3- and 5-year ACS files append all of the 1-year ACS files for their respective period into a single file and are helpful for increasing sample sizes when users are concerned about large margins of error in their estimates. Unlike these multi-year files, the 1-year ACS files allow users to generate annual estimates. There is usually no reason to mix these different types of files into a single analysis.
In the 1-year ACS microdata, PERWT is constructed so that the sample is representative of the total population for the given year (i.e. the sum of PERWT will equal the total US population for that year). The weights for the multi-year files have been adjusted to avoid totaling to the US population three or five times for the 3-year and 5-year ACS respectively. The 3-year ACS samples append three 1-year ACS samples, with the weights divided by three. The 5-year ACS samples follow the same pattern. You can compare total employment in different OCC1990 categories across ACS samples by summing up the PERWTs for each sample for observations that satisfy your criteria and directly comparing the results. Note that if you combine multiple one-year ACS files (e.g. pool the the 2011, 2012, and 2013 1-year files instead of using the 3-year file), you will need to adjust the weights to account for this pooling.
When you compare OCC1990 categories over time, please note that samples from 2018-onwards will harmonize occupation codes differently than samples from 2000-2017 due to the new Census occupation classification scheme. OCC1990 is a variable created by IPUMS and will match respondents in the new occupation scheme to the OCC1990 category that the plurality of respondents in this occupation group would have been coded into if they had reported the same occupation in 1990. You can refer to the guide on integrated occupation and industry codes for more information.