PUMA 2010 - 2020 Crosswalk Use


I am seeking guidance in using the PUMA 2010 - 2020 Crosswalk recently added by IPUMS:

I would like to use the crosswalk to convert each sample in the combined 2022 5-year sample to 2020 PUMA definitions usually a probability approach. This would require me to reassign samples from 2018, 2019, 2020, and 2021 to 2020 PUMAs, based on the crosswalk.

I have two questions:

  1. Can I perform probability-based reassignment to 2020 PUMAs based on individual person-records, or would it be best to do this using household records?

  2. When performing probability-based reassignment, which columns in the crosswalk should be referenced for the probabilities?

Thank you,

To answer your specific questions:

  1. Yes, you can do this with individual person records. You could also do it using household records, but it’d be more accurate to do it at the person level given that the proportions given in the crosswalk are population proportions (e.g., the percent of a 2010 PUMA’s population in a 2020 PUMA) and not household proportions.

  2. The assignment you propose goes from 2010 PUMAs, as identified in 2018-2021, to 2020 PUMAs, so the appropriate proportion to use as an assignment weight is in the pPUMA10_Pop20 column…

    • The crosswalk file’s data dictionary sheet shows that pPUMA10_Pop20 is the “Estimated percent of the 2010 PUMA’s 2020 population that lies in the area of intersection”. As such, it’s a reasonable estimate of the probability that a given ~2020 resident of the identified 2010 PUMA was also a resident of the identified 2020 PUMA (which shares this “area of intersection” with the identified 2010 PUMA).
    • Make sure to divide the percentage in this column by 100 before using it as an assignment weight. Alternatively, you could compute the same proportion with greater precision by dividing Part_Pop20 by PUMA10_Pop20.


  • Where a 2010 PUMA intersects multiple 2020 PUMAs, this assignment strategy will result in each of the 2010 PUMA’s residents being “partially assigned” to multiple 2020 PUMAs. E.g., if a 2010 PUMA intersects two 2020 PUMAs with 2020 population proportions of 0.21 and 0.79, then the idea would be to allocate portions of 0.21 and 0.79 of each of the 2010 PUMA’s residents to the corresponding 2020 PUMAs. Then the 2020 PUMAs will contain “partial individuals” that should sum up to the correct approximate total for that 2020 PUMA (after also applying the correct person weights).
  • This process may produce substantial errors. E.g., if the population of interest is the foreign-born population, this type of assignment would assume that the 2010 PUMA’s foreign-born residents are distributed among 2020 PUMAs in the same proportion as the total population, but it could be that nearly all of the foreign-born residents are in only one intersecting 2020 PUMA and not in any others. Ideally, one would use additional small-area summary data to compute customized weights for each type of individual. I won’t go into detail here on how to do that… Just wanted to be clear that this approach’s simplicity comes at the cost of greater risk of error.
  • Given the risks of error, I would recommend switching the direction of your proposed allocation if possible. I.e., if you assigned 2022 respondents to 2010 PUMAs, rather than assigning 2018-2021 respondents to 2020 PUMAs, you’d be limiting the added error risk to only one year (~20% of the sample) rather than 4 years (~80%).
    • Not only that, PUMAs are commonly split up over time to adjust for growing populations, so 2020 PUMAs tend to be smaller than 2010 PUMAs. As such, allocating from 2010 to 2020 PUMAs involves more disaggregation, i.e., greater uncertainty and greater risk of error. Allocating backward, from 2020 to 2010 PUMAs, would conversely be more reliable.