Measuring Distance from Central City for Tracts

Hello,

I just realized there was a user forum for IPUMS, great!

My ultimate goal is to measure the distance in miles between each census tract in a metro area and a single “center” of that metro area. I mean “center” in the ecological model sense (Central Business District more than CBSA centroid or center of population). If I can get the lat/long for each census tract and the lat/long for the “center” of a metro, this doesn’t seem like that bad of a task. I think I have found sources for the lat/long of each tract and it seems like the NHGIS Place Point files will be a good measure of the “center” of a place.

I have 2 questions, 1. Do the NHGIS Place Point files have a lat/long for each “place point”? 2. If so, can I import only GISJOIN, Place Point Name and lat/long into a spreadsheet from the Place Point files available for download on the NHGIS Data Finder?

I am a GIS novice and work in STATA. Please correct any fundamental misunderstanding of the purpose of the Place Point files or pitfalls I am missing in this idea. Thank you for any suggestions you might have!

John

First, your general plan makes sense. You could definitely use NHGIS tract data and place points to measure distances from tracts to CBSA centers.

To answer your specific questions:

  1. The Place Point files do not have “lat/long” values per se. They instead identify point locations using a “projected coordinate system,” which describes each location as a number of meters removed from a reference meridian and latitude. All NHGIS shapefiles use the same system.

(Note also that units of latitude and longitude correspond to widely varying distances on the Earth, so to measure distances accurately, you would either need to compute spherical distance from lat/long values, or you can measure standard Euclidean distances using the projected coordinate system of NHGIS shapefiles–which is generally accurate within the contiguous U.S.)

  1. NHGIS provides the point files only in shapefile format, so it’s not possible to directly load them into a spreadsheet. You would need a utility that reads shapefiles. Stata has some utilities for doing that, but I’m not familiar with them, so I don’t know how easy it would be to extract coordinates or measure distances using Stata.

A couple extra comments:

  1. Most CBSA’s include many “places” (cities, suburbs, etc.), so it would be up to you to decide which place in each CBSA you want to use to represent the central place.

  2. Another resource you should consider using is NHGIS’s centers of population for tracts. Sometimes, a tract’s population is concentrated near an outer edge of a tract, so for most applications, the optimal measure of a tract’s distance to the CBSA center would be from the tract’s center of population, rather than from the tract’s geographic centroid.

Thanks Jonathan,

I really appreciate your thorough and prompt response.

From a quick search, it looks like applying a lat/long to a point is fairly easily done in ArcGIS. I’ll look into these stata utilities, but maybe I’ll need to apply a lat/long in a another program and then try to load with stata. I found a stata command that appears to support spherical measurement between lat/long points.

You make a good point about choosing among “places” in a metropolitan area. This could get pretty tricky, but at this point, I think I am really just interested in using a single “principal” city. I know this term has a specific meaning in Census speak, but I am not sure how else to say it. For example, in Denver-Aurora-Lakewood, CO, I’d want to use Denver city over Aurora. For most metros, I could just choose the largest place by population. This wouldn’t be ideal in places like the Twin Cities, where choosing by size would select Minneapolis, but the point I am really thinking about is probably closer to the 280 and 94 interchange. I don’t really have a systematic way to adjudicate these types of issues on a national scale.

For tract lat/longs, I agree that tract center of population is probably better than using the geographic centroid. I had thought about the centers of population from the Census Bureau. It looks like the NHGIS versions are pretty comparable, but maybe easier to use with other NHGIS data. I guess a benefit of using the CB’s files over the NHGIS files is that I’ll have lat/long and won’t need to compute it from the GIS point location.

Thanks again, I’ll dig into this today and reply to the post if I have any luck.

John

John, that all sounds correct and reasonable. Just to confirm: yes, the NHGIS version of the centers of population is derived directly from the Census version, so they should agree completely, and you’re right that if you just want the lat/long, the Census file gives you those. The main advantage of the NHGIS version would be if you wanted to map the centers or to do your distance measures using GIS, in which case, the NHGIS version is in a ready-to-use shapefile… I guess I’m biased toward that option, but totally understand why you’d just use the Census lat/longs!

Apologies for the long message. I got this plan to work, but encountered one major stumbling block.

Implicit in my design was assigning place points to metropolitan areas. The Place Point GISJOIN is keyed to the “place” geographic level. It took me a while to figure out (really to remember, as I have been in this situation in the past) that Census places are not necessarily contiguous with higher level Census geographies. Places can and do cross county and state boundaries. It was pretty hard to find files that assigned county or metropolitan area codes to place codes. I found a file that assigned state, county and ANSI place codes, but I ended up using this file which contains state, county, cbsa, and place codes (assigning all at the place level). I am sure this is not ideal because the lists and assignments are before 2010 (the year for which I used the Place Points and Tract Centers of Population). I used the county and
place codes to re-make GISJOIN at the place level and match to the Place Point file. A number of small CDPs did not match between the files. I just let this go as my interest is in larger metros.

I couldn’t figure out how to download population tabulations from NHGIS with GISJOIN included. I have done this in the past and found some old files that had GISJOIN, but I couldn’t remember how I had done it. Using a NHGIS file I had downloaded for a previous project with only place, total population and GISJOIN, I merged in place total population.

With state, county and cbsa codes now matched to the place points, I assigned places to metros (using state and county metro definitions). I selected the place point for the largest place in a metro. I then merged in the Tract Center of Population on metro area. Then I used the stata geodist command to measure spherical distance from each tract center of population to the lat/lon from Place Points (that I calculated with ArcGIS). After adjusting the distance measurements for metro size (both area and population), tract distance from city center (my selected place point) appeared in my models with the expected magnitude and sign, yay!

I ended up not using the stata gis utilities and I am sure there are other, easier ways to do this whole process.

I couldn’t have gotten this done without your comments Jonathan, I really appreciate it.

John

John, I’m glad it all worked out in the end!

Regarding GISJOINs in NHGIS table files… We unfortunately do not provide GISJOINs in our fixed width files, but we do provide them in our comma-separated values (CSV) files. (You can choose which format you’d like to download on the last page before you submit a request.) I’m guessing your newer files are in fixed width format and your older files were CSVs.

Adding GISJOINs to the fixed width files has been somewhat high on our priority list for around a half year, and we’re hoping to get to it within the next few months. Thanks for reporting this stumbling block… It helps remind us of the potential value of this feature!