Complete Self Reported Ancestry by either county and congressional district IPUMS NHGIS

I’m trying to get the data of self reported ancestry at either the county level or congressional district level recorded in years 1980,1990,2000,2010, and 2020 but I can’t find a data set that contains all the information. IPUMS USA doesn’t record information for most US counties (looking at the 1980 5% sample).
IPUMS NHGIS has the ancestry data at the levels I want but its incomplete. Has nothing on ancestry from south America and occasionally doesn’t even record European ancestry.
I’ve seen maps that illustrate self reported county ancestry but can’t find the raw data itself.

As a result of the confidentiality protections implemented in the data by the Census Bureau, the public use microdata samples from 1950-onwards that IPUMS USA harmonizes do not provide the county of residence for all respondents. On the other hand, data tables on IPUMS NHGIS are able to provide statistics for all counties and congressional districts since they aggregate individual records by geographic location.

County and congressional district level data on reported ancestry is available in the decennial census and American Community Survey (ACS) datasets on IPUMS NHGIS from 1980-2020. From 1980-2000, the ancestry question is available through the main decennial census files. After 2000, this question moved to the ACS, which was designed to be an annual longform supplement to the Census. Data from the ACS is released in single year and 5-year summary datasets; as you might expect, the 5-year file provides estimates using five years of ACS data. Data for each single year is available only for areas with at least 65,000 residents. Most counties are smaller than that, so to get data for all counties after 2000, you should use ACS 5-year data. To find these datasets, you should select 5-year ranges as your year filter (i.e., 2008-2012 for 2010 and 2018-2022 for 2020) for county data, though you can use the 1-year ACS data for congressional district level tables. For county-level data tables on ancestry, see table NTPA15 in the 1980_STF4Pa dataset (125 ancestry categories reported), NPA21 in 1990_STF4a (106 different ancestries), NPCT018B in 2000_SF3a (71 ancestries), B04003 in 2008_2012_ACS5b (107 ancestries), and B04006 in 2018_2022_ACS5b (108 ancestries).

While the aggregated data allows the Census Bureau to release statistics for all counties/districts, there are still instances when reported categories need to be aggregated themselves due to small samples that risk compromising respondent confidentiality. This is why you might not see reported ancestries for each South American country, but instead an aggregate all other Central and South American and/or Carribean group. For example, in the 1980 data I am seeing separate Brazilian and Guyanese ancestry groups, and then an Other Caribbean, Central and South American group. Ancestries are also a complicated topic and not necessarily tied to country of origin; an Argentinian in the US might provide their ancestry as German or Italian even if they were originally born in Argentina (the topic filter Nativity and Place of Birth on IPUMS NHGIS can provide more tables on this). You will also want to determine if you want to analyze respondents based on their first reported ancestry or based on all of their reported ancestries since different tables are available for each.

Thanks for the response. I’ve seen these files you mentioned but they appear to be missing large demographic groups. Only 1980-1990 have data on African(Afro) Americans and none I’ve looked at have anything on Mexicans despite these being two of the largest ancestry’s in the US.

The Census Bureau tallies ancestry data somewhat differently from data on Hispanic origin and race. Appendix B (Definitions of Subject Characteristics) in the 2000 SF3 Technical Documentation directly discusses this process:

Also, the [ancestry] question was intended to provide data for groups that were not included in the Hispanic origin and race questions. Official Hispanic origin data come from long-form
questionnaire Item 5, and official race data come from long-form questionnaire Item 6.
Therefore, although data on all groups are collected, the ancestry data shown in these
tabulations are for non-Hispanic and nonrace groups. Hispanic and race groups are included
in the ‘‘Other groups’’ category for the ancestry tables in these tabulations.

You can also refer to the technical documentation for the other decennial summary files and the ACS Subject Definitions for more information about each dataset.

For data on peoples with Mexican origin or descent, you will want to select Hispanic Origin from the topics menu. This is a separate topic due to the large amount of data tables pertaining to this demographic group.

Your question on the ancestry of African Americans is more complicated; many Black Americans do not have detailed knowledge of their ethnic ancestry due to the history of forced migration and enslavement. As a result, respondents in this group may provide only a broad origin (e.g., Sub-Saharan African) or in other cases choose to identify as African-American as their ancestry (or report no ancestry at all). For example, see this Census Bureau post which notes that “over half of all Black respondents wrote in African American as their detailed identity”. Also, note that there may be systematic differences between Black Americans who provide more detailed and less detailed ancestry responses (e.g., immigrant status).

With that said, there are tables that provide reported ancestries originating in Africa in 2000, 2008-2012, and 2018-2022. For 2000, the table NPCT016C Population that Reported First Ancestry by Selected Detailed Ancestry (from the 2000_SF3a dataset) provides counts for a number of ethnicities originating in Africa (see screenshot below) These ethnicities are also provided in the ACS 5-year tables that I mentioned in my previous response .