PUMA 2010 - 2020 Crosswalk Use

Joshua_Tuttle · June 10, 2024, 3:55pm

Hello!

I am seeking guidance in using the PUMA 2010 - 2020 Crosswalk recently added by IPUMS:

I would like to use the crosswalk to convert each sample in the combined 2022 5-year sample to 2020 PUMA definitions usually a probability approach. This would require me to reassign samples from 2018, 2019, 2020, and 2021 to 2020 PUMAs, based on the crosswalk.

I have two questions:

Can I perform probability-based reassignment to 2020 PUMAs based on individual person-records, or would it be best to do this using household records?
When performing probability-based reassignment, which columns in the crosswalk should be referenced for the probabilities?

Thank you,
Josh

JonathanSchroeder · June 10, 2024, 8:58pm

To answer your specific questions:

Yes, you can do this with individual person records. You could also do it using household records, but it’d be more accurate to do it at the person level given that the proportions given in the crosswalk are population proportions (e.g., the percent of a 2010 PUMA’s population in a 2020 PUMA) and not household proportions.
The assignment you propose goes from 2010 PUMAs, as identified in 2018-2021, to 2020 PUMAs, so the appropriate proportion to use as an assignment weight is in the pPUMA10_Pop20 column…
- The crosswalk file’s data dictionary sheet shows that pPUMA10_Pop20 is the “Estimated percent of the 2010 PUMA’s 2020 population that lies in the area of intersection”. As such, it’s a reasonable estimate of the probability that a given ~2020 resident of the identified 2010 PUMA was also a resident of the identified 2020 PUMA (which shares this “area of intersection” with the identified 2010 PUMA).
- Make sure to divide the percentage in this column by 100 before using it as an assignment weight. Alternatively, you could compute the same proportion with greater precision by dividing Part_Pop20 by PUMA10_Pop20.

Notes:

Where a 2010 PUMA intersects multiple 2020 PUMAs, this assignment strategy will result in each of the 2010 PUMA’s residents being “partially assigned” to multiple 2020 PUMAs. E.g., if a 2010 PUMA intersects two 2020 PUMAs with 2020 population proportions of 0.21 and 0.79, then the idea would be to allocate portions of 0.21 and 0.79 of each of the 2010 PUMA’s residents to the corresponding 2020 PUMAs. Then the 2020 PUMAs will contain “partial individuals” that should sum up to the correct approximate total for that 2020 PUMA (after also applying the correct person weights).
This process may produce substantial errors. E.g., if the population of interest is the foreign-born population, this type of assignment would assume that the 2010 PUMA’s foreign-born residents are distributed among 2020 PUMAs in the same proportion as the total population, but it could be that nearly all of the foreign-born residents are in only one intersecting 2020 PUMA and not in any others. Ideally, one would use additional small-area summary data to compute customized weights for each type of individual. I won’t go into detail here on how to do that… Just wanted to be clear that this approach’s simplicity comes at the cost of greater risk of error.
Given the risks of error, I would recommend switching the direction of your proposed allocation if possible. I.e., if you assigned 2022 respondents to 2010 PUMAs, rather than assigning 2018-2021 respondents to 2020 PUMAs, you’d be limiting the added error risk to only one year (~20% of the sample) rather than 4 years (~80%).
- Not only that, PUMAs are commonly split up over time to adjust for growing populations, so 2020 PUMAs tend to be smaller than 2010 PUMAs. As such, allocating from 2010 to 2020 PUMAs involves more disaggregation, i.e., greater uncertainty and greater risk of error. Allocating backward, from 2020 to 2010 PUMAs, would conversely be more reliable.

Joshua_Tuttle · August 26, 2024, 6:10pm

This is great thank you. We would like to take the approach you have outlined above. Can you please tell us what columns/data fields to use to make sure we do this properly?

JonathanSchroeder · August 26, 2024, 7:15pm

To allocate from 2020 PUMAs to 2010 PUMAs, you follow the same instructions I shared previously but with the source and target years swapped, as I’ve done in the square brackets in the text below:

JonathanSchroeder:

The assignment [I] propose goes from [2020] PUMAs, as identified in [2022], to [2010] PUMAs, so the appropriate proportion to use as an assignment weight is in the [pPUMA20_Pop20] column…

The crosswalk file’s data dictionary sheet shows that [pPUMA20_Pop20] is the “Estimated percent of the [2020] PUMA’s 2020 population that lies in the area of intersection”. As such, it’s a reasonable estimate of the probability that a given ~2020 resident of the identified [2020] PUMA was also a resident of the identified [2010] PUMA (which shares this “area of intersection” with the identified [2020] PUMA).

Make sure to divide the percentage in this column by 100 before using it as an assignment weight. Alternatively, you could compute the same proportion with greater precision by dividing Part_Pop20 by [PUMA20_Pop20].

Joshua_Tuttle · September 3, 2024, 12:04pm

Thank you, Jonathan.

Topic		Replies	Views
How to use the crosswalk between 2000 and 2010 PUMAs USA	1	1016	October 2, 2017
Matching PUMAs 1990-2020	8	641	March 14, 2023
Crosswalk between 1960 and 2000 PUMAs USA	1	516	September 21, 2017
2020 Public Use Microdata Areas USA	3	269	January 9, 2024
Crosswalk between census block to puma, Census or ACS 2020 USA	4	1357	March 29, 2022

PUMA 2010 - 2020 Crosswalk Use

Related topics