Median Hourly Earnings Based on EARNWT

Hi everyone,

I’m very new to the community and R in general - was wondering if anyone had the best formula or syntax for calculating the median hourly earnings for certain occupations based on EARNWT?

My current code is below and I’m not sure what formula makes the most sense to incoporate the EARNWT variable
nhdata %>%
filter(OCC == 4110) %>%
mutate(n_earners = sum(EARNWT)) %>%
summarise(n = median(EARNWEEK/UHRSWORKORG))

Would greatly appreciate any help on this!

I am not aware of any relevant sample code in R for IPUMS on this specific request and am not an R user. However, I am linking a page with information on calculating a weighted median in R. It looks to me like your current code restricts your data file to waitstaff and first a weighted average hourly wage, then determines the median for the weighted hourly wage.

For general R resources, you might be interested in the R data training exercises for IPUMS CPS, Quick-R (, or R4DS.

Thanks for the link to this! Yes, I’m trying to calculate the weighted median wage for waitstaff and also do a comparison by gender to determine the wage gap between the sexes. I want to do this for a variety of customarily tipped occupations including the waitstaff in different states (i.e. New Hampshire). The only problem is the amount of observations for some states are very small for these occupations, so I’m really trying to make sure I’m incorporating the weights properly. Will incorporating the weights solve for the issue of a small sample size?

I’ll definitely use the link you sent, but I’m just not sure how properly to incorporate it in the syntax

IPUMS doesn’t provide code review; however, based on my interpretation of the description of the required arguments for the basic usage weighted.median(x, w, na.rm = TRUE), you need to provide:

  • a vector of numeric values for which you would like the median values, or x (e.g., a new variable you create that is the average hourly wage),
  • a weight variable, or w (e.g., EARNWT)
  • a decision about ignoring NA values (note that IPUMS assigns special codes to NA values, so you will likely want to exclude these earlier in your analysis when creating the average hourly wage variable)

I hope this helps.

I missed the update to your post where you ask about weights and small sample sizes–my apologies for not responding to your entire query.

Weights will inflate your counts to the estimated “true” population size. That being said, depending on your unweighted sample size, it may not be appropriate to make population inferences from such a small group. In general, there is no bright-line rule regarding “too small” to study. Although I can say that more is always better, and one observation is certainly too small. In practice, what will happen is the sampling error around estimated statistics will be relatively large and will, therefore, limit any informative interpretation from the data.

One way to increase the sample size of your estimates is to pool multiple samples together (e.g., across various months to combine multiple ORGs). This will increase the number of observations in your data and the statistical precision but will limit the temporal precision of your analysis. Note that if you do pool together multiple samples you will need to adjust the sampling weights so that they properly account for the combined samples. An approximate way to do this is to divide the sampling weight by the number of samples you are pooling together.