I have a question on how to work with the weights when calculating the average wage in the Basic Monthly survey. The method I followed so far was to divide the EARNWEEK variable by AHRSWORKT to get the wage for each person. After that, I multiplied it my wage with EARNWT variable. Since I would like to obtain the average wage for each month, I just summed up the EARNWT variable for exactly these individuals for which I have data related to their wages and created the ratio sum of WAGE*EARNWT/ sum EARNWT. Unfortunately, my monthly average wage over time contains tremendous fluctuations and is definitely not true…
Do I need to include another Weight, for example the basic household weight? Or what is my exact mistake?
Your calculation approach looks fine to me, though you may consider also using HOURWAGE for those who are paid hourly.
There are a few things that might be going wrong. One is that AHRSWORKT refers to the previous week, while EARNWEEK refers to a typical week. This could cause lots of fluctuation in the estimates. To get a more comparable usual hourly wage, you’d want to use UHRSWORKT as your denominator. Second, make sure you are accounting for coding in the variable you use. EARNWEEK has a top code and UHRSWORKT has an NIU code and an “hours vary” code.
If you’re still having issues after checking these, please give some more details about your problem (specific samples, a table with the results you are seeing, and any other relevant info).
thanks for your reply!
I used the variable UHRSWORKT and it looks a bit better now… Besides that, I also included the NIU codes for UHRSWORKT as well as the variable HOURWAGE.
Unfortunately, I don’t know how to use topcodes for the variable EARNWEEK. The website refers to a topcode value of 2884.61 from 2003 onwards. Contrary to the topcodes when using HOURWAGE, there is no clear explanation what to insert instead of 2884.61. In addition, since the average wage in my sample still tremendously higher as the average wage provided e.g. by FRED, I don’t believe that the topcodes are the issue. Hopefully the graph or my data sample might give you an idea of what is going wrong…
Many thanks in advance, I appreciate every help
Best regards,
Freddy
problem_average_wage_ipums.xlsx (63.5 KB)
For EARNWEEK, the variable is already top-coded. Anything above 2884.61 is assigned the value of 2884.61. This implies that your mean will be under-estimated, but most quantiles like medians won’t be affected.
I calculated mean hourly wage using EARNWEEK/UHRSWORK1. The mean wage is pretty smooth over time, steadily increasing from about $15 in Jan 2000 to $27 in Jan 2021. I don’t see any of the sharp discontinuities that are in your graph. I’m not sure what’s going wrong with your estimation. Can you tell me how you calculated weightcount_av1 and realweight_av1 in your spreadsheet? Also what FRED wage series are you comparing this to?
Here’s the Stata code I used for my hourly wage calculation, if that helps. All I did was exclude NIU and “hours vary” codes from the calculation. You should be using EARNWT for your estimates, as well.
gen hourwage_calc = earnweek / uhrswork1 if earnweek < 9999.99 & uhrswork1 < 997