You may approach this using either the microdata on IPUMS USA or the summary data tables on IPUMS NHGIS:
It’s possible to use 1-year samples and a simple allocation model to handle cases where PUMAs straddle MSA boundaries. This approach has been suggested in other forum posts (e.g., allocating microdata from PUMAs to cities or between vintages of PUMAs) by distributing households based on population shares in intersecting units. For this, you would need a crosswalk between PUMAs and 2023 MSA delineations. IPUMS provides crosswalks between 2020 PUMAs and 2023 MSAs, but not yet for 2010 PUMAs, which you’d need to analyze years prior to 2022. To create such a crosswalk, you can combine:
- The 2010 census tract-to-PUMA relationship file, showing which tracts (and counties) make up each PUMA;
- The 2023 CBSA delineation file, showing which counties make up each 2023 MSA.
From there, you could estimate the overlap between PUMAs and MSAs using tract-level counts of rental units, which would allow you to compute allocation weights specifically for rental housing rather than total population. These are available in 5-year summary ACS data tables on IPUMS NHGIS. One special case to be aware of is Connecticut, where counties were replaced in 2022 by planning regions. This complicates efforts to match 2010 PUMAs to 2023 MSAs, since the newer MSAs are based on these updated regions (see this forum post for a general approach to this issue).
Even with careful geographic allocation, sample size is a concern. Many mid-size MSAs may have fewer than 100,000 rental units, so even with 1% microdata samples, annual rent estimates may not be statistically reliable due to high sampling error. For this reason, it may be a good idea to aggregate data over several years to analyze (non-overlapping) 3- or 5-year periods.
You will need to use the 5-year summary data since 1-year ACS summary tables are only available for geographic areas (in your case, counties) with populations of at least 65,000. While 5-year data expands geographic coverage, it is not suitable for constructing annual panel datasets since about 80% of each 5-year sample overlaps with the previous/next sample.
You can access relevant rent distribution data by filtering by County as your geographic level and Rent and Renter Costs as your topics filter in the data finder tool. This should allow you to locate table B25063 Gross Rent, which provides the number of rental units in each of 26 rent intervals. You can then sum counts across counties to approximate the gross rent distribution for each MSA. While this does not give you the mean or median rent, this more detailed data about the distribution can be combined with an interpolation method that you might use to approximate the statistic that you’re looking to calculate. You will also want to keep in mind that your standard errors may be relatively high for smaller counties; this guide to variance estimation using the summary data includes worked examples that may be helpful.