I am working on a project where I would like to get annual average earnings by occupational group, commuting zone, and year. I am planning to use the ACS microdata to do this. I am interested in the years 2008-2020.
My question is, what steps should I take to aggregate the data in this way? Specifically:
What is the best way to aggregate PUMAs into Commuting Zones? I.e. how can I know which PUMAs belong to which commuting zones?
Do I need to modify the survey weights if I am aggregating PUMAs to Commuting Zones?
I was able to find 1 or 2 other posts on this forum about this but they are a few years old, and if I am understanding them correctly it seems that the correct steps to take may be dependent on the exact time period.
There are many PUMAs that correspond to multiple commuting zones, making it impossible to identify in such cases which commuting zone a particular PUMA/household is located in. You might try to only include PUMAs that are fully within a single commuting zone, though this will obviously affect your estimates for that zone. Alternatively, you might only consider commuting zones that are coterminous with all of its corresponding PUMAs. Official USDA ERS commuting zones stop at 2000 definitions. Chris Fowler with others at Penn State created updated 2010 versions using the ERS methodology here. Both of these files only provide county-to-commuting zone crosswalks. A 1990 vintage PUMA-to-commuting zone crosswalk is available on Geocorr, but you would then need a crosswalk from 1990 to 2000 and 2010 based PUMAs. My recommendation is to use a county-to-commuting zone crosswalk and identify household commuting zones with the IPUMS variable COUNTY. Note that COUNTY can only be identified when the county is coterminous with a single PUMA or when it contains multiple PUMAs, none of which extend into other counties. You will want to look out however for cases when a county is not identified, but the entire PUMA is still within the commuting zone. Note also that 2008-2011 respondents are coded using 2000 based PUMAs, while respondents from 2012 correspond to the Census 2010 based PUMAs.
There is no need to modify person weights when aggregating your data and calculating mean earnings. Note again that when identifying commuting zones using COUNTY that some commuting zones will be truncated due to counties not being identified in the data.
The first question is I think a fairly difficult task, as @Ivan_Strahof 's response to point 1 suggests. He gives some great resources for helping. In addition to his, I will add another resource which I think could help you-- coming from urban economist David Dorn.
See the section “[E] Local Labor Market Geography” where he has “provide[d] a probabilistic matching of sub-state geographic units in U.S. Census Public Use Files to [Commuting Zones].”