I’m trying to connect the population variable from 1990 to 2020 at the census block level using 2010 as a reference year. I downloaded the crosswalk file to that end (e.g., Blocks → Blocks: GISJOIN Identifiers for 1990 to 2010), but I am a bit confused about how to use it.
When I looked at the weights variable in the crosswalk data, I noticed that the weights variable sums up to 1 for the 1990 blocks. To be super concrete, G0900010010101102 in 1990 consists of two blocks in 2010, namely G09000100101011001 (weight = 0.9927) and G09000100101011002 (weight = 0.0073). It’s clear that the weights sum up to 1 in this case.
However, when I try to connect the two years in a reverse direction, I noticed that G09000100101011002 (a block in 2010) is matched to only one block in 1990 (G0900010010101102 in 1990) with a weight 0.0073. I’m not entirely sure if something is missing, or the 2010 block (G09000100101011002) is in its entirety with only a fraction of the 1990 block. I also noticed converse cases that one block in 2010 is matched to two blocks in 1990 with each weight 1 (e.g., G09000100101011010 in 2010 is matched to G0900010010101107 and G0900010010101108 and in both cases, weight is 1). In matching 1990 blocks to 2010 blocks (2010 as reference year), weights do not have to sum up to 1?
More generally, I’m curious if the crosswalk file goes both ways (e.g., 1990 to 2010 and 2010 to 1990). Thank you!
The 1990-to-2010 crosswalks are designed only for allocation in one direction, from 1990 to 2010. That is why the weights sum to 1 for 1990 units but not for 2010 units. Effective allocation weights could not all sum to 1 for both years. To provide effective weights for allocating from 2010 to 1990 would require additional modeling and computation. NHGIS might add those in the future, but we have no immediate plans.
There is a table that summarizes crosswalk availability here, and it indicates the direction for which each crosswalk is designed. The only crosswalks designed for “backward” allocation (from a later year to an earlier year) are for 2020 to 2010.
Thank you so much for the reply. Can I ask a quick follow-up / clarification question? Using the 1990-to-2010 crosswalk, we can assign the 2010 variables (say population) to the 1990 geography (i.e., GISJOIN). Put it differently, it allows users to compare 1990 and 2010 variables based on the 1990 geography. Is this the correct understanding? Thank you!
The 1990 to 2010 crosswalk is designed to standardize 1990 variables onto 2010 census geographies.
The following text on our geographic crosswalks page describe what the weights represent and how to apply the crosswalks to 1990 census block data:
In a block-to-block crosswalk, each record identifies a possible intersection between a single source block and a single target block, along with an interpolation weight (ranging between 0 and 1) identifying approximately what portion of the source zone’s population and housing units were located in the intersection. These weights can be used to estimate how any counts available for source blocks (e.g., females age 75 and over, single-member households, owner-occupied housing units, etc.) are distributed among target blocks.
For example, to interpolate count data from 1990 blocks to 2010 blocks:
- Obtain data of interest for 1990 blocks
- E.g., using the NHGIS Data Finder, find and download tables for the Block geographic level for the year 1990 (dataset 1990_STF1)
- Join the 1990-block-to-2010-block crosswalk to the 1990 block data of interest
- Multiply the 1990 block counts by the crosswalk’s interpolation weights, producing estimated counts for all 1990-2010 block intersections, or “atoms”
- Sum these atom counts for each 2010 block