Harmonizing 2000 census block groups to 2010 boundaries and then locating in 2010 census tracts

Hi,

I just wanted to double check that my workflow made sense:

I’m working a project in which I need census block group data on race and hispanic ethnicity to create tract-level segregation indices to look at changing segregation patterns over time. I am using data from 2000, 2009-2013, and 2015-2019. My workflow is a follows:

  1. obtain 2000 block group data that is harmonized to 2010 boundaries using the time series tables from data finder
  2. merge harmonized 2000 data (which are in 2010 boundaries) with 2009-2013 and 2015-2019 block group data
  3. use the GISJOIN variable to identify respective census tract of each set of block groups
  4. create segregation indices for each census tract based on block group data
  5. retain segregation estimates for tract data and merge with master tract level dataset

I think this works, but wanted to make sure I didn’t miss anything. Thanks.

Best,
Kasey

Yes, in your step 3, you can generally just drop the last digit of the block group’s GISJOIN to get the corresponding tract GISJOIN.

(I’m glad you figured that out! I’d initially started drafting a reply to the first version of your post, which suggested using NHGIS crosswalks to get BG-to-tract associations, but your revised post gets it right: the crosswalks are for bridging from one year’s units to another year’s, so they wouldn’t be helpful for bridging from 2010 BGs to 2010 tracts, even in cases like this where the data represent 2000 characteristics. A 2010 BG-tract crosswalk is also unnecessary because the associations between those units are baked into their codes!)

There are a couple remaining issues, though:

  1. The 2010 block groups & tracts match up in nearly all cases with the 2013 and 2019 versions, but there are a few exceptions. This section on the crosswalks page provides more info.
  2. I wonder how well segregation indices apply to sets of block groups within tracts. Tracts contain only a few block groups at most, and, I believe, sometimes contain only one or two block groups. Perhaps you’d want to consider measuring segregation among blocks rather than among block groups? Also, in metro areas with extreme segregation, individual tracts often have highly homogeneous populations, such that there may be little within-tract segregation even in cases where there is a great deal of between-tract segregation… just something to consider as you proceed.

Thanks for the thoughtful feedback, as usual, Johnathan.

Re: remaining issues: I would be interested in using block data to calculate the segregation indices so that I have more variation, but there isn’t 2000-2010 time series data available at the block level for race or hispanic ethnicity. So, I’d still need to harmonize 2000 block data into 2010 block boundaries using the block-to-block crosswalk so that all block group data are in the same boundaries. Then I can calculate the segregation indices for each tract, right?

This step isn’t necessary using block group data because the crosswalk has already been done by NHGIS and can be requested in the time series data, correct?

I’ve struggled with this a bit: my models require each tract have its own unique segregation score, not one sore for the entire metro area. This is why I’m looking at block and block-group data to look at within-tract segregation. Between tract segregation would be the same as within metro segregation and provide me one score for all tracts within a metro area, right? (assuming I have tracts nested in metro areas, tho I could group tracts by counties or pumas to get more variation in tract level estimates).

Thanks again for guidance on this. Always very helpful.

Best,
Kasey

So, I’d still need to harmonize 2000 block data into 2010 block boundaries…

Yes, that’s right. There are ways you could associate 2000 blocks with 2010 tracts, so you could measure segregation using 2000 block (rather than 2010 blocks), but I think the simpler strategy–and a comparably effective one–would be to harmonize to 2010 blocks first.

This step isn’t necessary using block group data because the crosswalk has already been done by NHGIS and can be requested in the time series data, correct?

Yes.

Between tract segregation would be the same as within metro segregation and provide me one score for all tracts within a metro area, right?

Ultimately, this requires some careful thinking, which I haven’t done yet myself, so I don’t have an easy answer! But, in general, it seems to me that the metro area is the proper unit of analysis for most segregation indices; that’s the area over which segregation most obviously operates. No tract is an island, removed from the segregation forces occurring in other parts of its metro area. I’m not sure what it would mean to say that, within a single metro area, some tracts are very segregated and others are not… especially in a case where most tracts had nearly homogeneous racial composition. I’d suggest just giving more thought to what exact hypotheses you’re investigating and whether measuring segregation by tract could really tell you what you’re looking for. Maybe there’s something in the literature that’s done something like this before?

OK, that’s my limit… I’ll leave the rest to you!

Thanks, Jonathan! Yea, the unit of analysis for segregation has been one I’ve been toying around with for quite a while and how I would model it. Thanks for you advice and your advice on the data management.

Best,
Kasey