How should I reconstruct rent variables to fit newest MSA delineations?

Dear IPUMS Team,

I am a researcher studying housing markets, and I greatly appreciate the support you provide to the research community.

For the project I am currently working on, I am creating a panel dataset of MSAs fixed at the most recent MSA delineations published by the OMB.

This means that for some MSAs that lost or gained counties, I need to account for this change by “reconstructing” rent variables to ensure that the geographic boundaries associated with that variable are consistent over time.

For example, if counties A, B, and C were not designated as an MSA in 2016 but became an MSA in 2021, I have to find a way to impute the gross rent for that area in 2016. The ACS would not give me a direct estimate because each ACS product is based on the most recent delineations in the release year, and no retrospective updates are provided.

What is the best practice to do the above mentioned imputation? Should I take a weighted average of the rents of these counties? If so, what should the weights be?

An alternative approach to do this would be to use the microdata samples from IPUMS USA and take the weighted median of all of the people that live inside an MSA boundary. However, the smallest identifiable geography for microdata is the PUMA, which does not align neatly with MSA boundaries. Therefore, grouping microdata by MSA does not seem to be a good idea. Please tell me if I am wrong in any sense.

Thank you again for your time and for the valuable resources you make available.

Best,

Richard Yun

You may approach this using either the microdata on IPUMS USA or the summary data tables on IPUMS NHGIS:

IPUMS USA

It’s possible to use 1-year samples and a simple allocation model to handle cases where PUMAs straddle MSA boundaries. This approach has been suggested in other forum posts (e.g., allocating microdata from PUMAs to cities or between vintages of PUMAs) by distributing households based on population shares in intersecting units. For this, you would need a crosswalk between PUMAs and 2023 MSA delineations. IPUMS provides crosswalks between 2020 PUMAs and 2023 MSAs, but not yet for 2010 PUMAs, which you’d need to analyze years prior to 2022. To create such a crosswalk, you can combine:

From there, you could estimate the overlap between PUMAs and MSAs using tract-level counts of rental units, which would allow you to compute allocation weights specifically for rental housing rather than total population. These are available in 5-year summary ACS data tables on IPUMS NHGIS. One special case to be aware of is Connecticut, where counties were replaced in 2022 by planning regions. This complicates efforts to match 2010 PUMAs to 2023 MSAs, since the newer MSAs are based on these updated regions (see this forum post for a general approach to this issue).

Even with careful geographic allocation, sample size is a concern. Many mid-size MSAs may have fewer than 100,000 rental units, so even with 1% microdata samples, annual rent estimates may not be statistically reliable due to high sampling error. For this reason, it may be a good idea to aggregate data over several years to analyze (non-overlapping) 3- or 5-year periods.

IPUMS NHGIS

You will need to use the 5-year summary data since 1-year ACS summary tables are only available for geographic areas (in your case, counties) with populations of at least 65,000. While 5-year data expands geographic coverage, it is not suitable for constructing annual panel datasets since about 80% of each 5-year sample overlaps with the previous/next sample.

You can access relevant rent distribution data by filtering by County as your geographic level and Rent and Renter Costs as your topics filter in the data finder tool. This should allow you to locate table B25063 Gross Rent, which provides the number of rental units in each of 26 rent intervals. You can then sum counts across counties to approximate the gross rent distribution for each MSA. While this does not give you the mean or median rent, this more detailed data about the distribution can be combined with an interpolation method that you might use to approximate the statistic that you’re looking to calculate. You will also want to keep in mind that your standard errors may be relatively high for smaller counties; this guide to variance estimation using the summary data includes worked examples that may be helpful.