NHGIS: Issue Replicating CenBlock Crosswalk 2000->2010; goal to create 2010->2000 xwalk


Good Morning,

Background: I could not find a 2010CenBlock -> 2000CenBlock crosswalk, so I am attempting to use the same method as IPUMS, which has a 2000->2010 xwalk, based on this IPUMS page (https://www.nhgis.org/documentation/t…). Thus, I want to first replicate the NHGIS crosswalk to make sure I understand how.


  1. I am unable to replicate the IPUMS “weight” variable for 2000->2010. Below is some Stata code attempting to do this.

  2. I do not see the mapping from the TDW method (discussed in Schroeder (2007)) to the steps in the above link. For example, what is the Z variable (assuming A = area, Y= population 2010)? This is very likely the cause of Issue 1.

Any help is appreciated.

This is my Stata code:

* Data: Census 2000Tab to 2010Tab Block Relationship File; NHGIS 2000, 2010 pop at block

* id00 : State-County-Tract-Block code for 2000Block

*** Step 1

* create intersect area / source area; note: a0 =IPUMS “parea” variable

gen a0 = (arealand_int / arealand_2000)

* Create Intersect area population assuming uniform pop dist in 2010

gen a0Xp10 = a0*pop10

*** Step 2

* Sum up the estimated intersect area by 2000Block

bys id00 : egen total_a0Xp10 = total(a0Xp10)

*** Step 3

* weight = est intersect pop / total est pop of 2010Block

gen w1 = a0Xp10 / total_a0Xp10

Thank you,



Hi Luke,

Responding to the *second issue* first: how do the steps in the construction of the NHGIS 2000->2010 block crosswalk relate to the equations in the article (Schroeder 2007)?

In the case of the 2000->2010 crosswalk, the “source zones” (s) are 2000 blocks and the “target zones” (t) are 2010 blocks.

In the article, Y is a variable of interest measured on source zones. In the application case, it could be any feature for which we have 2000 block counts (e.g., population, housing units, population by race, etc.), and y[s] is a specific 2000 count for block s (e.g., the 2000 population of a 2000 block).

The final crosswalk does not include values of Y. It includes only weights indicating the portion of y[s] that we expect to lie in each target zone t. I.e., each weight gives us an estimate of (y[st] / y[s]) for a specific source zone s and target zone t.

To use the crosswalk file, we would ultimately multiply y[s] (a 2000 block count) by EST(y[st] / y[s]) (an interpolation weight) to obtain EST(y[st]) (the estimated count of a 2000 characteristic Y within an intersection between 2000 block s and 2010 block t). To get EST(y[t]), an estimate of the 2000 characteristic Y for 2010 block t, we sum EST(y[st]) within t.

In the article, Z is a variable measured on target zones that (we expect) has a spatial distribution similar to Y’s. For the NHGIS crosswalk, Z is the sum of 2010 population and housing units, and z[t] is the sum of 2010 population and housing units in a specific 2010 block t.

In the article, Equation 3 specifies that:

EST(y[st]) = (z[st] / z[s]) * y[s]

From this, the formula for an interpolation weight is:

EST(y[st] / y[s]) = z[st] / z[s]

In application, z[st] is the sum of 2010 population and housing in the area of intersection of 2000 block s and 2010 block t, and z[s] is the sum of 2010 population and housing in 2000 block s.

In the text describing the NHGIS crosswalk construction, Step 1 estimates z[st] values, Step 2 estimates z[s] values, and Step 3 computes all interpolation weights as z[st] / z[s].

The formula for Step 1 comes from Equation 6 in the article:

EST(z[st]) = (A[st] / A[t]) * z[t]

In application, this means that EST(z[st]) is equal to:

z[t] (the sum of 2010 population and housing units in 2010 block t)

* A[st] (the area of intersection between 2000 block s and 2010 block t)

/ A[t] (the area of 2010 block t)

We compute these areas as land areas _except_ when the land area of the 2010 block is 0, in which case we use water areas.

Finally, coming back to the *first issue*…

I see a few errors in Step 1 of your code. It appears to be using this equation:

EST(z[st]) = (A[st] / A[s]) * pop[t]

It should use:

EST(z[st]) = (A[st] / A[t]) * z[t]

Z should be set to the sum of 2010 population and housing units, not 2010 population alone.

The “a0” component should be the area of intersection divided by the _2010_ block’s area, not the 2000 block’s area.

And finally, additional special handling is required in instances where the land area of the 2010 block is zero, in which case water are should be used.

Later, in Step 3, more special handling is needed where EST(z[s]) = 0 (the estimated sum of 2010 population and housing units in 2000 block s is zero), in which case the interpolation weight should be set to A[st] / A[s] (with, again, special handling to use water area if the land area is zero).

Finally, note that the NHGIS crosswalk uses this TDW model only in the 95.7% of cases where the advanced model was not used. For the other 4.3% (the “blocks of interest”), the TDW weight will not generally equal the weight in the crosswalk.