Using EARNWT for multiple years of CPS outgoing rotations data

According to the answer to question “Using EARNWT variable in STATA is resulting in incorrect number of observations”, pooling 12 months of data will created a weighted estimate 12x the size of the population, and the recommendation to create an annualised earnings estimate was therefore to either:

a.) Divide EARNWT by 12, or

b.) Use EARNWT as-is to calculated the weighted estimate for each month and averaging over those 12 months

My question has to do with the recommendation for multiple years of data. For example, if I were to pool 3 years of data, would (b) be the preferred approach, so that I would not have to re-scale EARNWT, since the denominators would differ by year? What would be the recommendation to re-scale EARNWT if I wanted to do everything in one fell swoop. Also, I’m assuming in Stata, these are population weights–is this correct?


If you are pooling multiple samples of data together, then the general recommendation to approximate the correct sampling weight values is to divide the sampling weight by the number of samples you are pooling together. So, if you are pooling 3 years of ASEC samples together (e.g. 3 samples), then you can divide the sampling weight by 3. If you are pooling 3 years of basic monthly samples together (e.g. 36 samples), then you can divide the sampling weight by 36.

This is only an approximate correction however. A more complicated, but “better”, approach is to divide the sampling weight in sample i by the (number of observations in sample i)/number of observations in total pooled sample). If every sample had exactly the same number of observations, this method would simply to the method discussed above. As you point out, however, the samples differ slightly in size from year to year. Therefore, there will likely be a slight difference between these two methods, although I don’t think the difference will be all that meaningful.

Finally, regarding the specification of sampling weights in Stata. This can differ based on the specific details of the statistic you are estimating, but in general pweight is used most of the time.