I’m currently doing some analysis with the IPUMS-USA ACS data and am looking for some advice on which weights are appropriate to use in Stata. I’m looking to do individual-level analysis, so I am working with the PERWT variable. As this variable reflects the population represented by each individual in the sample, it at first seemed to me like frequency weights (fweight) were appropriate, and simple tabulations in Stata seemed to support this.
However, after searching around the web for more documentation, the consensus (while sparse) seems to be that probability weights (pweight) should be used instead, and that I should first svyset the data before performing any analysis. My principal reference is from Stack Overflow, here: http://stackoverflow.com/questions/5446078/frequency-weighting-in-r-comparing-results-with-stata
So, here’s what I’m gathering from the online discussions and my readings of Stata:
-
For simple tabulations that represent the US population, use frequency weights (fweight).
-
For any statistical calculation (mean, regression, etc.), use the probability weights (pweight).
Am I understanding this correctly? Thanks for your advice.