Combine both micro- and aggregate level data to compute change across years and incorporate the survey design

Hello, I have 1-year ACS micro-level data for two separate years for Texas (e.g., 2005 and 2010) with puma and poverty (outcome variable) and some other variables. Can data be aggregated at the puma level (e.g., puma population) while also including/leaving the individual-level variables in the same data while applying the survey design, for instance, to perform logistic regression? I’m trying to calculate more complex variables at the puma level, but providing the simplest of examples can help me get an idea of whether it is doable.

I use R, and have searched and cannot seem to find examples of how to do this. Aggregated data are used for descriptive information or for maps of ratios, but I cannot locate a logistic regression example that combines both individual and aggregated data (e.g., at the puma level). I’ve tried doing this with the srvyr and survey packages but to no avail.

Thank you in advance for pointing me toward possible solutions.

It isn’t clear to me if you are inquiring about R code for multi-level modeling or summarizing person-level variables to create PUMA-level measures for your analyses. Our team will answer questions about data or resources available through IPUMS, but leaves analytical decisions about defining models and the most appropriate code to use to individual researchers. You may be interested in this information about multi-level modeling in R or summarizing by group in R using the dplyr package. Consulting the relevant literature for approaches on the type of model you are interested in conducting, the ACS data users forum, or R user forums may also be a good place to start.

As you mentioned, R packages survey and srvyr can help facilitate specification of sample design with the ipumsr package. This forum post has a discussion on using the srvyr package along with some sample code.

Thank you for the response - I appreciate it along with the resources which I will certainly review.

While I do use R and so some of my inquiry is related to R, I am also researching ways to summarize IPUMS USA data such as to compute changes across years (e.g., at a PUMA level) while also incorporating that information into an object over which the micro-level survey design can then be applied, or for which the usage of the appropriate weights, strata, cluster, etc., information can then be adequately applied for analysis… so my interest is on any resources related to that type of structure of the IPUMS USA data.

For instance, what is the impact that the person weight has when one of the individual-level variables has been summarized at a puma level? Does the summarizing of some of the micro-level variables to a PUMA level to perform analyses that account for the random effects that occur at a PUMA level imply that the person weight needs to be adjusted to a PUMA level weight? If so, how?

In terms of R, the srvyr and survey packages do not seem to offer information about how to incorporate summarized data in the survey designs that they offer (and which I will continue researching and will direct the inquiry via those other resources you mention - unless this type of question about the data is relevant to the team, where the goal is to help gain a better understanding of the data, different aspects about its structure, and their practical applications).

Again, thank you.

It sounds like you are interested in summary measures rather than person-level microdata (though I don’t know the details of your research agenda or analytical plan). Questions about adjusting weights for specific types of analyses are beyond the scope of the IPUMS User Support team; however, I can direct you to some resources that may be of interest. For more information about the sample design and structure of the ACS, including an example about producing estimates with ACS data, see this IPUMS USA page on sample design and estimation in the ACS.

To estimate total population by PUMA, it is appropriate to group the data by PUMA and sum the person-level weight PERWT. In order to calculate standard errors, however, I would instead recommend using a statistical command to calculate weighted summary measures. Replicate weights can also be incorporated to calculate standard errors; this IPUMS USA page on replicate weights in the ACS provides more information on how and when to use replicate weights, as well as some sample Stata code. In addition, these data training exercises provide helpful examples of using R with IPUMS data.

Alternatively, you may also be interested in looking at IPUMS NHGIS to see if the relevant tables already exist. Assuming you are using PUMA because it is the lowest level of geographic detail available for the entire US, the other perk of using aggregate data from IPUMS NHGIS is that you can get lower levels of geography.