I am analyzing the effect of immigration on native-born workers in the construction sector. I am analyzing only those individuals who have occ2010 codes 6210 through 6765. I want to obtain the average incomes (and unemployment rates) for these native-born construction workers in each Metropolitan Statistical Area (MSA) in each year so that I can use these as dependent variables to run regressions. Do I need to use the perwt variable to obtain MSA-level averages that are representative of all native-born construction workers for each MSA? More specifically, if I take a weighted average using the perwt variable to calculate the average income (unemployment rate) for natives in each MSA in each year, will this weighting procedure give me statistics that are representative of the native construction workers in each MSA or will this give me statistics that are potentially representative of some other group of workers (possibly of workers in non-construction professions or of construction workers at the national level instead of the MSA level)? I am currently using the simple average and need to know if using the weighted average will provide more reliable estimates of construction workers in each MSA in each year.

I am quite interested in how you get MSA-level data. I learnt from the Video that no MSA data are full counts but PUMA data.

IPUMS NHGIS offers U.S. census and American Community Survey data in the form of summary tables aggregated at various geographic levels, including MSA.

To capture metropolitan areas over time, you will need to use slightly different units depending on the year. Specifically: Between 1950 and 1980, the Standard Metropolitan Statistical Area [1950-1980] was used. Between 1990 and 2000, there are the Primary Metropolitan Statistical Area [1990-2000] and the Metropolitan Statistical Area/Consolidated Metropolitan Statistical Area [1990-2000]. For post-2000 years, there is the Core Based (Micropolitan/Metropolitan) Statistical Area [2003-Present]. Note that IPUMS NHGIS does not currently provide pre-1980 data at the MSA level.

The U.S. census is conducted every 10 years, and the ACS is conducted yearly beginning in 2001. So, for the time period you are interested in, you will likely want to draw data from the 1970, 1980, 1990, and 2000 decennial censuses, and the ACS through 2014. The summary tables from IPUMS NHGIS include a variety of topics, including nativity, birthplace, education, and income.

You can select these geographic levels, the year(s) of data, and the table topics in the IPUMS NHGIS data finder/extract system and browse what is available that meets your criteria. This brief video tutorial shows how to use the data finder/extract system.