What I’m hoping is an easy question.
Does anyone have a CBSA to Census Region (either the four main regions - North, South, West, Midwest, or subdivisions) crosswalk?
I’d rather use this than manually sort, and since some cross state lines, a state to region crosswalk is less helpful BUT would still be helpful if someone has it.
I’m not aware of any such crosswalk. For metros that span multiple states, you’d need to make a judgement on which region to assign it to. For example, you could use the state containing most of the metro’s population, or containing its primary city. You can use the variable REGION to assign a census region to each observation in the data. Then it is fairly straightforward to calculate the fraction of the metro’s population in each region. For example you could use this code in Stata:
collapse (sum perwt) , by(region met2013)
This would give all combinations of region and metro area in the data, along with the population of each metro residing in each region.
Matthew’s response assumes you’re working with IPUMS USA. Note that IPUMS USA cannot identify all CBSAs; namely, it identifies no micropolitan areas, and it identifies metro areas somewhat imperfectly and incompletely, given the limited geographic info available in public use microdata.
You can construct a complete crosswalk for all CBSAs using decennial census data from IPUMS NHGIS. I’d download the Race table from the 2020 Census (the 2020 PL94-171 Redistricting Data dataset) at the county level. This file will contain a single record per county, including separate columns identifying all units that contain the county, including a REGION code (“REGIONA”) and a CBSA code (“CBSAA”). You could then summarize the 2020 county populations by region and CBSA to determine the fraction of each CBSA’s population that resides in each region.
I believe the CBSA codes in the 2020 census data identify the September 2018 version of the CBSAs. (I found text in the 2020 data’s Technical Documentation that states that, “All legal boundaries used for the 2020 Census are those reported to the Census Bureau to be in effect as of January 1, 2020. The statistical area boundaries also reflect a January 1, 2020, date for delineation.”)
Update: This post prompted me to investigate which version of CBSAs are used in the 2020 Redistricting Data. It seems that this info is not yet publicly documented anywhere!
Based on the text I quoted in my last message, I assumed the CBSAs were the September 2018 version. But I’ve now learned from Census Bureau staff that the Redistricting Data use March 2020 CBSAs (as are used in the 2020 5-year ACS summary files and 2020 TIGER/Line files). They plan to update the documentation for the 2020 Redistricting Data to clarify this.