Calculating number of workers earning below a threshold

I’m pretty new to analyzing IPUMS data and want to make sure I’m using the data correctly. I’m trying to get a read on the number of people who report weekly earnings above and below a certain threshold (in this case, $500), broken down by sex. For this, I’m using CPS monthly data since 2019. My extract includes the following variables: SEX, LABFORCE, EARNWT and EARNWEEK, and below is my code in R.

cps_data %>%
	mutate(EARNWEEK_2 = as.numeric(as.character(cps_data$EARNWEEK))) %>%
	mutate(earn_bins = ifelse(cps_data$EARNWEEK_2 <= 500, 'under 500', 'over 500')) %>% #bin weekly earnings into those making above and below $500
	filter(LABFORCE == 2 & EARNWEEK != 9999.99) %>% #exclude those not in the labor force and EARNWEEK NIUs
	group_by(YEAM = paste(YEAR, MONTH, sep = '-'), SEX_factor = as_factor(SEX), earn_bins) %>% #group by year/month and sex
	summarize(n = sum(EARNWT), EARNWEEK_avg = weighted.mean(EARNWEEK, EARNWT)) #summarize using EARNWT

First, does my methodology seem sound? I’m surprised at the monthly variability of the data. It’s not out of the realm, but choppier than I would have guessed.
Second, I’m wondering why the monthly totals don’t match other sources. For example, the BLS has some 75 million women aged 16+ in the labor force as of February, and my analysis has only 65 million. Is this discrepancy likely due to the way CPS data categorizes workers?

1 Like

The method you are using to estimate average monthly earnings looks good to me. The monthly variability I calculated using your method (a swing of about 50) is within a reasonable range that I would expect, given that the 25th and 75th percentiles of the distribution of EARNWEEK for your data are 420 and 1076, respectively.

Population estimates based on a subset of people who were included in the Outgoing Rotation Group (Earner Study) will differ from BLS estimates because of eligibility issues. The universe for Earner Study questions is more restrictive; you can identify these cases with ELIGORG==1 (this is equivalent to EMPSTAT in (10,12) & CLASSWKR in (22,23,25,27,28)–note that this excludes all self employed (both incorporated or unincorporated), military, and unpaid family workers). Because this is more restrictive than the universe used in BLS estimates of people in the labor force, you will get lower population estimates.