Matching MSAs using the NHGIS data

I want to download a .shp file that will identify the metropolitan statistical area boundaries that match the “metarea” variable from the IPUMS US Census and ACS data, which uses the 1999 OMB delineations. I want to use the geographical boundaries to graph out certain values (such as the share of immigrants and income levels) at the MSA level. I just downloaded the 2000 Tiger shape files for the MSA/CMSAs through the NHGIS website, but I’m not sure which varaible identifies the “ID” variable. I read part of the online manual, which seemed to indicate that the variable “extent” is the ID variable, which seems to work when I ran the shp2dta command in Stata. I see values for the MSAs but I don’t see a variable that identifies which MSA is being identified. Is there a file that will show me which MSA is being identified by the MSACMSA code that was produced from the .shp file? Or better yet, is there an easier way to get the boundaries for the MSAs that will match the metarea variable from the Ipums USA data?

To help with your specific goal, IPUMS doesn’t currently have many resources beyond those you’ve identified.

First, note that the NHGIS shapefile that matches up best with the IPUMS USA METAREA variable for 2000 samples is the Primary Metropolitan Statistical Area shapefile (i.e., CMSA–PMSA, not MSA/CMSA). PMSAs nest within Consolidated Metropolitan Statistical Areas, and the detailed METAREA codes generally identify PMSAs, not CMSAs.

Second, at this time, we have no crosswalk between the METAREA codes in IPUMS USA (which are unique to IPUMS) and the official MSA/PMSA codes, as used in the NHGIS shapefile. As explained in the METAREA description, the METAREA coding system is based on the official 1990 system with some adjustments. I think the best way to match at this time is to review the metro area names from each source and make matches based on that.

Lastly, you should know that for many metro areas, the METAREA codes omit large portions of the metro area population. See the “User Caution” note in the METAREA description and the summary of Incompletely Identified Metro Areas.

Given these issues, the IPUMS USA MET2013 variable might suit your aims better. It uses 2013 MSA definitions, which may be a problem for some analyses of 2000 data, but it is available for 2000 microdata samples and has the advantage of using official MSA codes, so it’s easy to match to NHGIS 2013 shapefiles, and it better represents actual MSA populations (with a maximum mismatch tolerance of 15%).

@zrutledge @JonathanSchroeder Did you ever find a solution to this problem? I also want to estimate the number of immigrants within MSAs in the year 1980, but I want to use 2013 delineations used in the met2013 definition.

I intend to use the met2013 variable primarily for calculating immigrant shares b/w 2000-2018, but I want to instrument them using 1980 share of immigrants but in the year 1980 the MSA def was a bit different. A professor has suggested that I simply look at the county composition of 2013 delineations and assign the same counties to 1980 MSA too. Wondering to what extent would this strategy work?

You can generate 1980 data for 2013 MSAs by building up from county data if you use summary data for 1980, which you can get from IPUMS NHGIS, rather than using microdata from IPUMS USA.

(Because of the limited geographic info available in public use microdata, IPUMS USA doesn’t provide county codes for most counties in the 1980 sample. See the COUNTYFIP description for more info. There are also some metro areas that we can’t identify in microdata, but it’s a much smaller percentage than for counties.)

From NHGIS, you can get 1980 census summary data for all 1980 counties. The down side is that you’d only be able to use the pre-defined cross-tabulations that the Census Bureau chose to publish, rather than having the complete responses available in microdata. But if you only need total counts and shares of immigrants (i.e. foreign-born population), then you can get that from summary tables, and then aggregate the 1980 county data up to 2013 MSA definitions.

Be aware that there have been some changes in county boundaries since 1980, which could complicate associations between 1980 counties and 2013 MSAs, but I think that would affect only a few cases.

1 Like

Hello Umair,
Unfortunately, I never did find a solution to this problem. Sorry I cannot be of further assistance. I think the only solution is to do some type of manual matching as you suggested, but the IPUMS data do not have county identifiers. You have to use a PUMA to county crosswalk to do that. That’s the only solution I know of. You can get the PUMA to county crosswalks at: Geocorr Applications - MCDC . I’ve used that method in other applications I have done, and it is imperfect at best. Good Luck!