I’ve so far been working with the annual SOC-level wages and employment numbers that the BLS publishes, but I’d prefer to use monthly wages instead. Towards that end I’m aggregating the CPS to the OCC1990, Month level and take the (weighted) average of HOURWAGE. However, I’m confused by which weights to use to reweight individuals appropriately. The codebook suggests EARNWT for any analysis including a small number of person-level variables (EARNWEEK, HOURWAGE, PAIDHOUR, and UNION), but does that only apply for individual-level analyses?
As noted, when working with HOURWAGE, which is a person-level variable, you will want to make use of EARNWT to generate representative statistics.
From what you have described, it sounds like you are aggregating HOURWAGE by combinations of month and OCC1990. In this case, you will want to use EARNWT when performing your aggregation. You will then have representative statistics for each month and OCC1990 combination in your data and no additional weights will then be required when using these aggregated representative statistics.
Awesome, thank you for confirming!
But are these data supposed to be ‘representative’ at the occupation level as well? I noticed that for some occupation-year cells there were only very few observations - i.e. just a single individual in occupation 4 (chief executives) or 24 (Insurance underwriters) in 1990.
Yes, they should be representative at the occupation level as well, but note that since some occupations are relatively rare the estimates will have large margins of error. It may be worth while to group these less underpopulated occupations in with similar occupations to reduce the margin of error.
If we are trying to find the ‘true hourly wage’ would we need to weight the ‘Hours worked at Main Job’ with ‘Final Weight’, and weight the ‘Hourly Wage’ with EARNWT?
Although I am not certain of your ultimate goal, it seems the most straightforward way to do this would be to use the HOURWAGE variable, which identifies the hourly wage in a respondent’s current job. Additionally, for the sake of comparability, if you want to use a measure of hours worked, you should use the UHRSWORKORG variable. Since both of these variables are part of the outgoing rotation group variables, they will work well together and generally apply to the same subset of respondents. Finally, this makes the choice of a sampling weight variable clear, you’ll want to use the EARNWT variable.
My ultimate goal is to find the average hourly wage of professions. Which is easy using the HOURWAGE. However, I also want to find a true wage, and by that I mean earnings divided by hours worked.
So far I am thinking I take [EARNWEEK] / [UHRSWORK1] * [EARNWT] = Weighted True Wage
This tells me the record’s true weighted wage correct?
I am completely new to weighting so bear with me if you can
If you substitute UHRSWORK1 with UHRSWORKORG, then yes, you are correct.
Another question related to professions and weighting:
If I wanted to get an Average Hours worked for each profession would I use
[UHRSWORK1] * [EARNWT]
[UHRSWORK1] * [WTFINL]
Okay so back to my previous question
[EARNWEEK] / [UHRSWORK1] * [EARNWT]
Does the [EARNWT] take care of the representation in place of [WTFINL]?
So doing something like this is incorrect as a weight is already applied at some point?
([EARNWEEK] * [EARNWT]) / ([UHRSWORK1] * [WTFINL]) = True Wage
Sorry, for the confusion here. If you want to divide EARNWEEK by a measure of usual hours worked, you should use UHRSWORKORG. Similar to the EARNWEEK variable, this variable is part of the outgoing rotation group of questions. The UHRSWORK1 variable, by contrast, comes from the basic monthly survey set of questions and presents some computational challenges when applied with the EARNWEEK variable. You can read more about the outgoing rotation group here.
The general rule of thumb when applying sampling weights that each calculation should only include one sampling weight. This is because some questions only apply to specific respondents and the sampling weight takes care of this complexity.
Therefore you should be doing something like this: [EARNWEEK] / [UHRSWORKORG] * [EARNWT]
So that means if I am using that rotation group I should limit my sample to aggregate Q4 in 2018 so that I am not getting duplicate responses?
Can I use an aggregate of all months in my profession averages in 2018 without worries of duplicates?
Actually, that is not necessary. If you are only using the basic monthly samples from a single calendar year (i.e., 2018) then individuals will only be in the outgoing rotation group once. This is because the CPS follows a 4-8-4 rotational sampling design. Households are included in the sample for 4 consecutive months, excluded for 8, and then included again for four more. For example if a household has MISH==4 in January 2018 they will not be MISH==8 until January 2019.
Okay I see.
Will a person show up 2 times in the sample export? Will I have to filter out the MISH==8 if I want to keep them from showing up 2 times in my calculations?