Adding your own data to an IPUMS dataset

Molly_Richard · January 11, 2023, 8:32pm

This is a general question, open to thoughts and guidance. Keywords: multilevel models, MSAs, joining datasets

I am interested in analyzing 5-year ACS microdata using a multi-level or hierarchical linear model. I’m interested in outcomes for individuals (level 1) nested in metropolitan areas (level 2). Specifically, I’ll be looking at housing arrangements and whether area-level housing costs influence people’s decisions (and how those relationships might vary best on level 1 demographic information, such as race or citizenship). I have MSA-level housing and economic data (e.g., median rent, vacancy rates) from the Census (downloaded from tidycensus in R). In theory, I want to join this data to my IPUMS microdata (person-level records) (joining on MET2013, and then limiting my sample to those in a metropolitan area). Does anyone have experience doing this, example studies to share, or words of caution or guidance on the appropriate methods?

I’ll add that I’m ultimately planning a 3-level analysis (individuals in households in MSAs), but that isn’t related to the primary question at hand (joining the microdata to MSA-level variables, so that individuals in identifiable MSAs will also have variables such as “MSA_medianrent”). Thanks!

Ivan_Strahof · January 20, 2023, 5:00am

I was glad that you found other studies to help guide your analysis! Please feel free to add them again as they may be a helpful reference for others.

I do want to briefly explain how MET2013 works since this is relevant to your model. Specifically, household location in the ACS PUMS is reported on the Public Use Microdata Area (PUMA) level. These are areas that have between 100-200k people and can subdivide or include multiple counties (you can find a map of these here). IPUMS uses PUMAs to impute residence in metropolitan areas. The protocol used by MET2013 is to assign a metro area to all residents of a PUMA if a majority of the PUMA’s 2010 population resided in the MSA. However, since PUMAs can cross MSA boundaries, assignment to a particular MSA does not imply that the selected household must have resided within that MSA. This code assignment protocol yields errors of omission (residents of an MSA who are not identified as residents) and errors of commission (non-residents who are identified as residents). As an index of mismatch, IPUMS uses the sum of percent omission error and percent commission error. MET2013 reports no code for MSAs where the sum of match errors is 15% or more. You can find the match summary on the description tab for MET2013 to determine the level of mismatch that you’re willing to tolerate in your analysis.

Topic		Replies	Views
Merging NSFG and ACS PUMS data to look at how MSA economic conditions affect reproductive outcomes USA	6	1197	November 12, 2019
CBSA variable in ACS?	2	684	July 14, 2022
MSA to MSA level migration data USA	1	209	June 6, 2024
Help with PUMA use in Stata USA	7	1059	April 19, 2023
Incorporating geographic level data in ACS analysis? USA	4	419	July 21, 2017

Adding your own data to an IPUMS dataset

Related topics