I have the standard IPUMS dataset, where each observation is an individual. I want to group by and summarize the data by state-PUMA, so that each record is now a unique state-PUMA. I have created a variable that is a unique state-puma ID, i.e., each PUM in the US has a unique value in this variable. This is not the challenge but I wanted to share this part to set the scene. I am using dplyr. I have a gross rent variable that is measured at the household level. I also have the household weights variable. I want to summarize the data such that I can get the total rent spent in a state-puma and total number of households in a state-puma.
I imagine it would look something like:
group_by(statePumaID) %>% summarize(rentGross = sum(RENTGRS, na.rm = TRUE, weight = HHWT), households = sum(HHWT, na.rm = TRUE))
I have two distinct questions:
- How do I count up the rent dollars spent in each PUMA:
1.1 Am I applying the weights correctly, in that I put the command ‘weight = TRUE’ in the sum within summarize?
1.2 Are the rent dollars reported in a way that makes them meaningful to be summed (if weighted by household, of course)?
- How do I count up the households in each PUMA?
2.2 Am I interrupting the weights correctly, i.e., that I can sum them to get a count of households in a puma? In other words, can they be meaningfully summed?
We have loved getting to know this dataset and package. It has been incredibly useful and smoothed so much of our analysis. This is just to make sure we are doing a few last steps of analysis right.