I would like to run some regressions in Stata using EARNWT, however I am not clear on whether it should be used as a frequency weight (indicating duplicate observations and using the fweight option in Stata) or a sampling weight (indicating inverse probability of selection into sample and using pweight respetcively).
Which is the correct specification: is EARNWT a frequency weight or a sampling weight?
The earner study weight EARNWT is a sampling weight. CPS weights are created in multiple stages to appropriately measure the probability of being sampled given the total population size, and factors including race, sex, and age. EARNWT is created using the same steps as the other CPS weights, with additional steps. From page 10-13 of technical paper 66 from the Census Bureau:
“Since 1979, most CPS files have included separate weights for the outgoing rotations…In addition to ratio adjustment to independent population controls (in the second stage), these weights also reflect additional constraints that force them to sum to the composited edited estimates of employment, unemployment, and not-in-labor-force each month. An individual’s outgoing rotation weight will be approximately four times his or her final weight.”
In STATA, a frequency weight represents the number of times STATA will count the associated observation as a person. STATA fweights must be integers. EARNWT can take non-integer values, and some STATA commands, such as tabulate, would require rounding of EARNWT to the nearest integer to use it as a weight.
Thank you Isabel! So do I understand correctly that EARNWT should be used with pweight in Stata (where possible), since it’s a sampling weight?
Yes, it is appropriate to use EARNWT as a pweight when possible as it is a sampling weight. Stata allows different weights with different commands, so you may not always be able to use pweights depending on your analysis. For commands (such as -summarize-) that do not accept pweights, you will get the same point estimates by treating the weights as fweights (and rounding them). Generally, these commands do not calculate the standard errors of estimates. Commands (such as -mean-) that calculate standard errors will always accept pweights, and in these cases whether you use [pweight] or [fweight] will affect the estimated standard errors (but not point estimates). This Stata Blog post explains the reasons that the different types of weights give the same point estimates, but not standard errors, and why certain commands do not allow pweights.