Longitudinal data crosswalking follow-up

Hi there,

I’ve been using NHGIS time-series data at the census tract level from 1970-present for a project related to gentrification/displacement dynamics before and after urban greening interventions. For the gentrification-related indicators we need (race/ethnicity, household income, population, housing, ed attainment, below poverty level) the time series data we need is only available at the census tract level for all the decennial years (so unfortunately not able to work up from smaller geographies).

The crosswalk files have been amazing to have access to and I’ve been able to analyze 12 of 16 total cases thanks to having those crosswalks (specifically using: population weights; housing units weights and rental units weights) to smooth census tract data to the after/target year boundaries. For example, to analyze changes following a highway removal that occurred in 1995, I’ve been smoothing 1990 data to 2000 tract shapes; or for a highway removal that occurred in 2005, smoothing 2000 data to 2010 tract shapes. I am now a bit stuck for the last four cities (San Francisco, Oakland, New York and Portland), as I’ve realized that the source year for my remaining cases is either 1970 or 1980, with the target zones/years a mix of 1990, 2000, 2010 - and for which I realize NHGIS does not (yet?) offer crosswalks.

My question is the following: For before/after neighborhood change studies at the census tract level - when crosswalks from older years than currently available are needed (1970/1980 in this case) - what might NHGIS recommend as best practice? In this case:

Would you suggest following the methodology NHGIS has laid out here to produce existing crosswalks? In other words, figuring out how to use target density weighting interpolation (as per Schroeder 2007) on tract-level data for source 1970/1980 tracts to get them to 1990 tract shapes and use existing NHGIS crosswalks from there if the comparison “after” year is later than 1990?

Or, might the best approach be smoothing all before/after comparisons to 2010 tract chapes (In this related thread from 2024, I see that IPUMS has recommended approaches by Lee and Lin 2018, Markley et al 2022, and/or Schroeder 2009). Looks like most of these would entail smoothing all data up or down to 2010 shapes rather than what I’ve been doing and smoothing data to a variety of “after” years on a case-by-case basis.

Any further ideas/feedback/recommendations/resources would be greatly appreciated.

Big thanks in advance! So appreciate all that IPUMS/NHGIS does.

Leanna

As you’ve outlined, there are many ways to deal with these issues, and I’m glad you’ve already found and reviewed recommendations I’ve provided elsewhere. It’s a good start! You’ve gotten to a point where there aren’t well established guidelines, and determining the optimal approach for your setting could require lengthy research. As such, I’ll provide a few general responses here without a lot of specifics.

Regarding which target year to use (e.g., to go from 1980 tracts to 1990 tracts or to 2010 tracts, etc.), an advantage of the “case-by-case” approach you’ve used is that tracts from consecutive censuses (e.g., 1980 to 1990) match each other better, meaning there will be fewer changes, and the changes that do occur will involve smaller differences, so overall, standardizing the data will require less interpolation, and that will result in fewer, smaller errors. The target-density-weighting (TDW) approach is also well suited to those settings, because it assumes that spatial distributions in the source year resemble distributions in the target year. That assumption weakens the farther apart the two years are.

I think the main advantages of standardizing to a single target year are that, first, it would give you a standard basis for a broader nationwide analysis if you wanted that (e.g. for a national study of tract-level characteristics, you could use only 2010 tracts), and second, it would allow you to take advantage of existing crosswalks (e.g., NHGIS crosswalks go to 2010 units but not to 2000 or 1990 units).

To extend TDW across a longer period, I defined a technique in Chapter 3 of my dissertation (which you cited from my previous post) called “cascading density weighting” (CDW). We use that technique in our 1990-to-2010 crosswalks & standardized time series, as explained here.

Given my familiarity with these techniques, my own inclination for your pre-1990 settings would be:

  • If bridging between consecutive censuses (1970-1980 or 1980-1990), use TDW
  • If bridging from 1970 to 1990 tracts, use CDW (which would use both 1990 and 1980 tract data to guide the model of 1970 distributions)
  • If bridging from pre-1990 years to post-1990 years, I’d do roughly the same thing you suggested:
    • First use one of the above techniques to bridge to 1990 tracts
    • Then use NHGIS crosswalks to bridge to a later year’s tracts

Given the many different variables you’re interested in standardizing, I’d also suggest that you begin by creating a crosswalk with just a few allocation weights, similar to the NHGIS crosswalks. E.g., for 1970-to-1980 interpolation, first estimate the proportion of each 1970 tract’s population, households, and housing units in each 1980 tract, as three separate statistics. Then you could use those proportions as weights for many different characteristics (rather than applying TDW to each of your variables of interest separately).

Final caveat: there are, of course, ways to improve on these models, most notably by using block data where possible to generate more exact interpolation weights. We are working on producing 1980 and 1970 block boundary data and have already published some. (See our 1980 and 1970 block boundaries pages.) We plan to use this data to produce 1980 and 1970 crosswalks in the future (both for blocks and higher levels), but it may be a few years before we begin publishing those.