2020CT to 2010 CT using relationship file

Hi,
I have created a 2010CT to 2020CT crosswalk using the US Census Relationship file for multiple health disparities variables. I would like to compare the 2020 crosswalk data to the original 2010 CT data in Arcgis. Can I do this using the weighted average of land for the ‘new’ 2020 CT?

For example, if a 2010 CT was split into two new 2020CTs, each could receive a different value from my variables. Can I use the same US census relationship file to calculate the weighted average going from 2020 back to 2010?

It sounds like you’ve used the Census Bureau’s census tract relationship files to allocate data from 2010 census tracts to 2020 census tracts by “area weighting”. That is, you allocate data from a 2010 tract in proportion to the share of its land area located in different 2020 tracts.

Area weighting is conceptually simple, and therefore relatively easy to apply and commonly used, but it is often very inaccurate. For example, imagine a case where a 2010 tract is split between two 2020 tracts, with 95% of the 2010 tract’s area in one 2020 tract and 5% in the other. Area weighting allocates 95% of the 2010 tract’s population to the first 2020 tract. But populations are often not distributed uniformly across land area within tracts, so it’s very possible that half or more of the 2010 tract’s population is in the smaller 2020 tract. This is especially likely because tracts are deliberately designed to have roughly similar population totals across tracts, so even though one of these two 2020 tracts is many times larger than the other, the small one is likely to have a similar population to the larger one.

Have you considered using the IPUMS NHGIS geographic crosswalks? As explained on the NHGIS crosswalks page, the NHGIS tract crosswalks include weights based on block-level characteristics within tracts, which is generally much more accurate than area-based weights. There are also separate NHGIS crosswalks for allocating either from 2010 to 2020 tracts or from 2020 to 2010 tracts, so they cover both settings.

I’d also recommend that, instead of using a tract-to-tract crosswalk, you consider using an NHGIS crosswalk from 2010 blocks to 2020 tracts with decennial census data, or if your data of interest aren’t available for blocks, then use a crosswalk from 2010 block groups to 2020 tracts. As explained here, using tract-to-tract crosswalks results in unnecessary errors that can be avoided if you start with base data from smaller units.

One caveat: the NHGIS crosswalk weights are designed for allocating count data only. If you need to crosswalk other types of data (medians, averages, indices), you’d need a different approach. I discussed some alternatives for medians and averages in this post.

1 Like