Household weighting for rent

Hi everyone!

I have the standard IPUMS dataset, where each observation is an individual. I want to group by and summarize the data by state-PUMA, so that each record is now a unique state-PUMA. I have created a variable that is a unique state-puma ID, i.e., each PUM in the US has a unique value in this variable. This is not the challenge but I wanted to share this part to set the scene. I am using dplyr. I have a gross rent variable that is measured at the household level. I also have the household weights variable. I want to summarize the data such that I can get the total rent spent in a state-puma and total number of households in a state-puma.

I imagine it would look something like:

group_by(statePumaID) %>% summarize(rentGross = sum(RENTGRS, na.rm = TRUE, weight = HHWT), households = sum(HHWT, na.rm = TRUE))

I have two distinct questions:

  1. How do I count up the rent dollars spent in each PUMA:

1.1 Am I applying the weights correctly, in that I put the command ‘weight = TRUE’ in the sum within summarize?

1.2 Are the rent dollars reported in a way that makes them meaningful to be summed (if weighted by household, of course)?

  1. How do I count up the households in each PUMA?

2.2 Am I interrupting the weights correctly, i.e., that I can sum them to get a count of households in a puma? In other words, can they be meaningfully summed?

We have loved getting to know this dataset and package. It has been incredibly useful and smoothed so much of our analysis. This is just to make sure we are doing a few last steps of analysis right.

Thank you!

I am not quite sure if I understand all of your questions, but will try to answer them. Please let me know if I can offer further clarification on any of these responses. I want to note that IPUMS doesn’t provide code review, but I can direct you to these data training exercises for examples of using R with IPUMS data. I will try to answer your questions with general descriptions about working with the data rather than commenting in your specific R syntax.

Regarding totaling rent (and utility) dollars by PUMA, I would go about this by limiting the sample to one person per household and summing all RENTGRS values within each state-PUMA combination. If you are using the household weight (which seems appropriate based on my understanding of your application and given that RENTGRS is a household-level variable), just be sure to restrict your sample to only one person per household, which is most easily done by filtering on PERNUM = 1. As noted on the codes tab for the RENTGRS variable, amounts are expressed in contemporary dollars and not adjusted for inflation (except in multi-year ACS files); there are also a number of special codes that you should account for described on that tab.

You are correct that to get an estimate of total households by PUMA, summing the weights (HHWT) is appropriate, again restricting to records with PERNUM = 1. See more and an example about producing estimates with ACS data in this IPUMS USA information about sample design and estimation in the ACS.

As an aside, based on your questions it seems like you might be interested in summary measures rather than person-level microdata (though I don’t know the details of your research agenda or analytical plan). It sounds like your current plan is to calculate two totals and divide them–I would recommend using a statistical command to calculate the weighted mean of rent instead as this would allow for calculation of standard errors. Alternatively, you may also be interested in looking at IPUMS NHGIS to see if the relevant tables already exist. Assuming you are using PUMA because it is the lowest level of geographic detail available for the entire US, the other perk of using aggregate data from IPUMS NHGIS is that you can get lower levels of geography.