Hello,
I am relatively new to IPUMS. My current project involves using American Community Survey (ACS) 1-year PUMS data for multiple years (2005 through 2023) for Tennessee. I know that it is 2000 PUMAs for years between 2005 and 2011, 2010 PUMAs for 2012-2021 and 2020 PUMAs for 2022 and forward.
I downloaded several person level and housing unit variables and also included PUMAs in my download. I am confused about the PUMA variable in the download. Are they consistent across all years, accounting for PUMA changes over time? If not how can I make sure they are all in 2020 PUMAs?
Thank you
The PUMA of residence is released by IPUMS as an unrecoded variable; it is only reported using the sample’s contemporary boundaries:
- In samples collected from 2005-2011, PUMA is only reported using 2000-vintage PUMA boundaries.
- In samples collected from 2012-2021, PUMA is only reported using 2010-vintage PUMA boundaries.
- In samples collected from 2022-2031 (expected), PUMA is only reported using 2020-vintage PUMA boundaries.
In the past, multi-year ACS files that include years from different PUMA vintages have reported the contemporaneous vintage for the survey year (e.g., in the 2008-2012 5-year PUMS file, the 2008 data would use 2000-vintage PUMAs and 2012 would use 2010-vintage PUMAs); it was therefore not possible to systematically obtain the exact PUMA of residence for an alternate vintage. However, beginning with the 2023 5-year ACS, which includes data collected from 2019-2023, the data report PUMA of residence using 2020-vintage PUMA boundaries for all households regardless of the year the data was collected. The Census Bureau has additionally re-released the 2022 5-year ACS file using 2020-vintage PUMA boundaries for all households; IPUMS USA will integrate this data in an upcoming release.
For your specific application, IPUMS does not currently offer a variable that would allow you to infer the 2020-vintage PUMA of residence in data collected before 2018. IPUMS USA offers a geographically harmonized PUMA variable (CPUMA0010) that provides a consistent PUMA definition for samples collected from 2005-2021 by bridging 2000- and 2010-vintage PUMAs. It does so by aggregating one or more 2010 PUMAs that, in combination, align closely (within a 1% population mismatch tolerance) with a corresponding set of 2000 PUMAs. As mentioned in this post, we plan to add a new CPUMA1020 variable in the future to bridge the 2010 and 2020 PUMA boundaries.
In the meantime, there are a few different approaches you might consider in order to run your analysis across the entire 2005-2023 period:
-
Run your analysis on the county level and restrict it to the five Tennessee counties that are identified across the entire period (Blount, Davidson, Rutherford, Shelby, and Williamson).
-
Create your own crosswalk that matches 2020-vintage PUMAs to consistent PUMAs (CPUMA0010). To do so, you will need to consult the CPUMA0010 summary file for the list of 2010 PUMAs within each consistent PUMA and match those to 2020 vintage PUMAs using the 2010-2020 PUMA crosswalk. For example, in Tennessee, 2010 PUMAs 2001, 2002, and 2003, which compose CPUMA0010 941 very closely map on 2020 PUMAs 3401, 3402, and 3403. You can then identify this CPUMA across all years.
-
As outlined in this forum post, you can use a probability approach to allocate shares of people from one vintage of PUMAs to another based on the proportion of the population in each intersection between different PUMAs. That post describes how to go from 2010 PUMAs to 2020 PUMAs or vice versa. You could use the same technique to go from 2000 PUMAs to 2010 PUMAs (and then do your analysis using 2010 PUMAs), or you could allocate from 2020 PUMAs to 0010 ConsPUMAs (and do your analysis using ConsPUMAs). This would require joining the CPUMA0010 2010 PUMA Components file to the 2010-2020 crosswalk by 2010 PUMA ID. This will be a one-to-many merge since the crosswalk will have one or more observations for each 2010 PUMA. You will then need to sum the populations of the 2020 PUMA segments across the consistent PUMAs to get the percent of each PUMA’s population that is in a particular consistent PUMA.