Calculating Weighted Proportions of Co-Ethnics within each PUMA

Hi! I’m using several iterations of the IPUMS USA for an RA project (2007 3 yr; 2010 3 yr; 2013 3 yr; 2018 5 yr; 2022 5 yr), and one of the variables were are trying to construct is the weighted proportion of coethnics with each PUMA for each respondent based on how they are categorized in the RACE variable. I’ve done the cleaning of each variable (RACE is currently a categorical variable), but I’m a little unclear about how to calculate these weighted proportions. I’ve followed the instructions on IPUMS User Notes for binary variables to get weighted means, but have gotten a little confused for this variable that has several categories. I’ve used the code below, but would appreciate any advice or clarification! Thank you!

data ← data %>%
group_by(MULTYEAR = haven::as_factor(MULTYEAR),
STATEICP = haven::as_factor(STATEICP),
PUMA = haven::as_factor(PUMA)) %>%
summarize(PropCoethnicPUMA = weighted.mean(RACE, PERWT, na.rm = TRUE))

Am I understanding correctly that you are looking to estimate the proportion of PUMA’s population by race (i.e., the percent of the PUMA’s population that is White)? If so, I believe you will want to include RACE in your group_by function:

group_by(MULTYEAR = haven::as_factor(MULTYEAR), STATEICP = haven::as_factor(STATEICP), PUMA = haven::as_factor(PUMA), RACE = haven::as_factor(RACE)) %>%

You can then run your analysis using:

summarize(n = sum(PERWT)) %>%
mutate(pct = n / sum(n))

You can obtain this sample code in our data training exercises.

We also offer an online analysis tool that allows researchers to run these types of analyses without needing to use a stats package. If you’re interested, I recommend viewing this video tutorial to familiarize yourself with the tool.