Hi Jeff, I am trying to obtain MSA-level averages for income and unemployment rates for people who report working in the construction occupations (OCC2010 codes 6210 - 6765). Do you know if the weights (given by perwt) are based on occupation? For instance: if the weight of one of my individual construction workers is 20, then is this worker representative of 20 other construction workers, or could this individual be representative of 20 other people who do not necessarily work in the construction occupations? My main concern is that if I use the weights attached to these construction workers to create weighted averages using the perwt variable at the MSA level, this may not necessarily be representative of only construction workers, and if that is the case, then I may be better off using a simple average. Any insight you could provide on this issue would be greatly appreciated. Thank you for your time.
Do I need to use perwt to get representative statistics at the MSA level for a particular set of occupations?
If you are using IPUMS USA (US Census or ACS) data, you shouldn’t encounter any problems when calculating representative statistics of occupations within MSAs. As long as you have a sufficient number of observations for each MSA-occupation pair, your weighted calculations should be relatively close to the true value. The answer is slightly different if you are using IPUMS CPS data, since the CPS is designed and optimized for national and state-level labor force statistical estimates. MSA level estimates in the CPS can be quite inaccurate due to a much smaller sample size, than compared to the US Census or ACS.
Hi Jeff, you responded to my question by saying, “If you are using IPUMS USA (US Census or ACS) data, you shouldn’t encounter any problems when calculating representative statistics of occupations within MSAs. As long as you have a sufficient number of observations for each MSA-occupation pair, your weighted calculations should be relatively close to the true value.” I have a follow-up question. Some of the MSA-occupations pairs that I am using have relatively small sample sizes in certain years (i.e. some MSA-Year-Occupation cells have less than 50 observations) which seems to indicate that weighted averages at the regional level when looking at a specific occupation (like construction or personal services) may be very noisy. Can you provide some guidelines as to how many observations for a particular sector of the economy in a given MSA would be considered sufficiently large enough to represent a regional average after applying the personal weights (for example if the sample size is over 50 then it is likely okay)? I am concerned that some weighted averages that I am using for some MSA-Occupatoin-Year combinations may be noisy when I apply the personal weight due to small sample size and I want to make sure that I identify MSAs who have weighted averages that are truly representative.
While you are correct that 50 observations does seem to be quite low, I can’t give you an exact number that is “sufficiently large”. Technically speaking many statistical tests need a sample size that is arbitrarily close to infinity. In practice, what this means is a judgement call based on the specific analysis and research question. The best advice I can give to to flag cells that seem to have a relatively low sample size (pick a number that makes sense for your data) and make note of this when discussing and interpeting your analysis.