Best Practices for Converting 1990 Blocks to 2010 Census Tracts and Handling Median Values

Hello,

I’ve been using the Longitudinal Tract Database (LTDB) to convert 1990 Census tracts to 2010 Census tracts for some time now. In the LTDB, I’ve been handling median values by applying base values, multiplying the interpolation weights, collapsing the data, and then unweighting.

Now, I’m working on converting 1990 Census blocks to 2010 Census tracts using NHGIS crosswalks, and I wanted to confirm best practices for handling median values at the block level.

Here’s the process I’ve been using for tracts:

  1. Weighing the Medians by Base Variables: For each median variable, I apply a base value (like population). If the median value is missing, the base variable is set to missing as well to avoid incorrect calculations during the unweighting step. In my code, I multiply the median value by the corresponding base value to generate a new weighted mean variable

  2. Applying Interpolation Weights: Once the medians are adjusted by their base values, I apply the interpolation weights provided by the crosswalk. This ensures that the data is adjusted to reflect the proportion of each block or tract contributing to the target geography. For each variable (counts, medians, and base values), I multiply the values by the interpolation weight.

  3. Collapsing into 2010 Census Tracts: After applying the interpolation weights, I collapse the data into 2010 census tracts by summing the weighted values. This step aggregates the data from the source zones (1990 tracts in my case) into the target zones (2010 census tracts). I also preserve missingness for any variables that were entirely missing before collapsing.

  4. Unweighting the Median Variables: Finally, I “unweight” the median values by dividing them by the corresponding base values to obtain the final weighted medians. This step normalizes the medians based on the previously applied base values, ensuring that the aggregated values are properly scaled.

I’m wondering if I can follow a similar process with the NHGIS crosswalk for block-to-tract conversions. Specifically:

  • Can I apply the interpolation weights provided by NHGIS in the same way?
  • How should I approach collapsing and unweighting the block-level data?
  • Are there any key differences or considerations when working with block-level data for medians?

Thank you in advance for your insights!

Best,
Aniket

  • Can I apply the interpolation weights provided by NHGIS in the same way?

I see no conceptual issues with your proposal. It should work the same for block-level medians as it does for tract-level medians, and in fact, starting from block-level medians would produce more accurate results. (As I understand, your method basically works on an assumption that all census responses within a source zone have a characteristic equal to the zone’s median–e.g., all people in a source tract have an age equal to the tract’s median age–and this assumption should be more accurate for block-level medians. Individual blocks have much smaller populations than tracts, so block-level medians should better correspond to individual-level characteristics than tract-level medians.)

  • How should I approach collapsing and unweighting the block-level data?

You could use the same workflow that you describe for tract-to-tract crosswalks and just replace the tract-to-tract crosswalk with an NHGIS block-to-tract crosswalk. The NHGIS website provides these general instructions for using the block crosswalks, and these instructions roughly mirror how you’ve described your process for tract-to-tract allocations. If that’s still not clear, please let me know.

  • Are there any key differences or considerations when working with block-level data for medians?

The biggest issue is that, to my knowledge, the 1990 census summary files didn’t provide medians for blocks, and no source provides “long-form data” (covering income, education, marital status, nativity, etc.) for blocks.

For long-form data, you’d need to start from block groups, block group parts, or census tracts, as explained in this section of the Crosswalks page.

For medians of 1990 short-form characteristics (like median age), you could still use block-level data, but you wouldn’t be able to start directly with a median statistic. Instead, you could use one of the two strategies I suggested in this earlier forum post, which I’ve copied here:

  1. Use means instead of medians, and apply crosswalks separately for each mean’s numerator and denominator. E.g., to compute per capita income, you could estimate “aggregate income” and “total population” separately using the crosswalk weights, and then divide one by the other.
  2. Start with a table of counts broken down by the value of interest, e.g., housing units by home value, and use the crosswalk to estimate each count in the table. Then estimate the median from the frequency distribution, for which there are various online guides .
1 Like