I would like to create a variable that indicates the racial composition of the respondents’ local area. Ideally, it would be at the neighborhood level but as that isn’t possible with the publically available census data, I know that I need to use the PUMA variable but I don’t know how to do it. The variables that I’d like to create will indicate the percentage of the PUMA that is black, white, etc. so that I can control for racial composition of the respondent’s area. I’d appreciate any direction. Thanks!
In terms of computer processes, there are a number of different ways you could go about attaching PUMA-level racial composition to individual respondents, depending upon which statistical software package you are using. However, what you will ultimately be doing is summarizing the RACE variable at the PUMA level. Whatever summary statistic you choose (e.g. percent white, percent black, or one variable for each racial group), you will essentially be looking at each PUMA individually, finding its weighted RACE summary, and then creating a new variable that contains that summary value for each person within the PUMA. You could do this by fist creating a separate PUMA level dataset that contains your race summaries and then merging those values onto the individual records based on PUMA. Or there may be functions within your statistical software that allow you to generate new variables summarized by a grouping variable (PUMA), such as egen in Stata.
I hope this helps.
Thanks for your response, Joe! This is very helpful. I neglected to mention my software program but I am indeed using Stata so your suggestion about using egen is immensely helpful. Thanks again!
egen mean () will not work as the data need to be weighted with the household or person weights. You would need to create two egen total(), by(puma), and then take the ratio:
egen puma_total_wgt = total( perwt ), by(puma)
egen puma_total_white = total( 1.race * perwt ), by(puma)
gen puma_frac_white = puma_total_white / puma_total_wgt