Hi there,
I am fairly new to using IPUMS and am seeking help with creating a table of the ten US cities with the greatest foreign-born populations in 2020. I tried to accomplish this using the ACS 5-year sample, but noticed that several large cities like Dallas, San Jose, and San Diego were missing. I then tried to use the best-matching PUMAs by using the “large_placePUMA2020_match_summ” Excel file provided by IPUMS. However, when I tabulated the total populations of cities using these PUMAs, they were smaller by ~10% than they actually are according to the 2020 census. I also tried using the NHGIS data to accomplish this, but nativity is not available for the 2020 data. Is there an alternative way to create this table? Like I said, I am a beginner so there is likely something that I may have overlooked. Thanks for your help!
Since your goal of finding the top ten US cities with the greatest foreign-born populations in 2020 does not require individual-level microdata, IPUMS NHGIS is a great choice for getting this data. The reason that you were unable to find data was because the Census Bureau didn’t produce a standard 2020 1-year ACS summary file in 2020 due to irregularities in sampling during the height of the COVID pandemic (see this IPUMS user guide for more information and links to Census Bureau documentation). However, you can find tables by nativity for 1-year ACS files (e.g. 2019, 2021) as well as for 5-year files (e.g. 2016-2020) by including place as your geographic filter in the Data Finder tool. Note that “place” is the Bureau’s general term for all city-like entities and includes both legally incorporated places (e.g. cities) and unincorporated centers of population that the Census Bureau groups into census designated places (CDP).
Unlike the tables in IPUMS NHGIS, the ACS Public Use Microdata File that IPUMS USA harmonizes does not report the city that a respondent resided in; STATE and PUMA are the only geographic identifiers provided. Where PUMA and city boundaries closely align, IPUMS USA identifies the CITY that PUMA residents resided in. More specifically, the protocol assigns all respondents in a given PUMA to a city if the majority of that PUMA’s population resided within the city limits. This results in errors of omission (residents of a city who are not identified as residents) and errors of commission (non-residents who are identified as residents). If the sum of errors that results from this protocol (CITYERR) is 10% or greater, then the city will not be identified and all respondents in the intersecting PUMAs will have CITY = 0000.
To take an example, the city of Dallas has a commission error of 17.82% and an omission error of 11.27% for a total error of 29.09% in the 2020 ACS (see the “large_placePUMA2010_match_summary” Excel file in the CITY comparability tab). As a result, it is not identified in the data. While you might use the 10 best-matching PUMAs where a majority of respondents resided within Dallas city boundaries to identify Dallas residents, we do not advise this since 29.09% of the residents of these 10 PUMAs did not actually reside within the city of Dallas. You might instead consider analyzing the Dallas-Fort Worth-Arlington metro area, which is identified in MET2013 since the sum of errors for the metro (MET2013ERR) was only equal to 1.85% in 2020. You should be aware that the 2022 ACS introduced new (2020 census based) PUMA boundaries. Anytime new PUMA boundaries are used, it affects the other geographic information that can be inferred from these units, including the cities we can identify using CITY or changes in the omission/commission error for cities that are identifiable after this change.
1 Like
Hi Ivan,
Thank you so much!! Your response was a huge help, I was able to accomplish my task and gain a better understanding of IPUMS data.
1 Like