Fulton County GA in 2022 5-year

Hi there,

I’m working with ACS 2022 5-year data for Georgia, and was looking at county-level characteristics when I noticed that the weighted population for Fulton county was 198,004, which is very off. I also noticed that the only MULTYEAR that observations have for this 5-year sample is 2022. I looked at the spreadsheet with identifiable counties and saw the Fulton (COUNTYFIPS=121), is only available for 2022. How do I handle this with a 5-year dataset?

Here are outputs I am referring to breakdown, with code for creating the output:

Thank you!

The smallest geographic area that is explicitly identified in the public use microdata from the ACS is the PUMA (public use microdata area). PUMAs are population-based areas of around 100,000 residents. IPUMS geographers are sometimes able to infer the county of residence based on the PUMA that the household resided in (i.e., when one or multiple PUMA boundaries perfectly match those of a particular county). Because PUMA definitions change over time, the counties identifiable by IPUMS based on PUMA also change over time. This spreadsheet lists all identified counties in IPUMS USA data from 1950-onward. You can see that Fulton County, Georgia was not identifiable in ACS data from 2012-2021, but it is identifiable in ACS data from 2022-onward.

In 5-year ACS data, the sampling weights are adjusted so that estimates represent an average over the 5-year period. In the case of Fulton County, which is only identifiable in one of the five years included in the 2018-2022 ACS PUMS files, this adjustment to the weights produces an artificially low population estimate for the count. This is why you see a weighted population estimate of about 200,000, which is about one fifth of the true population in 2022. You can imagine that if Fulton County were identified in the data in each year from 2018-2022, and the population counts from each single year were averaged over the five year span, you would see a population count of roughly one million. But since the only available population value within the 5-year file comes from 2022, that single value is averaged across five years. You can see which counties are identified in your data in each individual year of the 5-year ACS by tabulating counties by MULTYEAR (MULTYEAR gives the single year that a particular observation is from within the 5-year sample).

If you want to use microdata (person-level data) from IPUMS USA to study Fulton County, I would recommend only looking at those 1-year ACS data where Fulton County is identified. In this case, that would be the 2022 1-year ACS (or earlier periods such as the 2006-2011 ACS). I would not recommend using the five-year ACS files in which the county you are interested in is only identified in some of the corresponding single years of data.

Depending on your data needs, you may find IPUMS NHGIS useful. IPUMS NHGIS provides data from the U.S. census, American Community Survey, and other data sources aggregated at various geographic levels, such as county. Using NHGIS data, you can download data tables that give the population of each county in the 2018-2022 ACS or other characteristics of counties and other geographic areas (such as states, PUMAs, block groups, and school districts). While IPUMS USA data are microdata, where every observation is a person or household, IPUMS NHGIS data are summary data, where every observation is a particular geographic area, such as Fulton County, GA. Summary data do not allow for person-level analyses, but can provide more precise information about a wider range of geographic areas. For new IPUMS NHGIS users, I recommend our user guide and this short video tutorial on the data finder.

Thank you, this makes a lot of sense in terms of why I was getting the results I was getting. However, it makes me nervous for other research I’ve done that generates area-level estimates of various demographic characteristics for counties, cities, and metropolitan areas with 5-year datasets. Unless I check that each one was available for all 5 years of the sampling frame, I may have artificially low or high estimates and not be aware (and use them in my research or writing). Has IPUMS considered changing the data so that, in 5-year samples, the geographic identifiers are unidentifiable unless they are available for each year? Or adjusting the weights differently? Or would you suggest always going to the availability list and excluding any data for areas that are not identifiable for all 5 years?

I would recommend that you confirm that all counties you are studying are defined in all of the years of data you are analyzing. Your suggestion to only provide codes for areas that are identifiable in each year for multi-year data files is well taken. While we have a note about geography in multi-year samples in the description of the PUMA variable, we do not provide this type of documentation for variables that are based on PUMAs, such as county. We plan to improve our documentation to provide this information in more locations so users are more likely to be aware that not all geographic areas will be identified in multi-year ACS files that use multiple PUMA definitions.