Repeated Census Tracts Within County

Kelsey_O_Hollaren · January 4, 2024, 2:37am

Hi all,

I’m constructing a gentrification index which requires comparing variable levels between census places and census tracts. My understanding is that census tracts are meant to be unique within counties (i.e., there shouldn’t be repeats).

However, in my dataset, which I downloaded from NHGIS and includes the entirety of the US has several hundred observations with duplicate county fips codes and tracts with different values for economic and demographic variables I’m interested in.

Has anyone else encountered this issue?

I’m, of course, happy to provide more specifics if that would be helpful.

Cheers!

David_Van_Riper · January 4, 2024, 2:14pm

Hi Kelsey,

Census tracts definitely should be unique within counties! Can you provide more details about the data you downloaded (e.g., year and/or dataset) so that we can help answer your question?

Sincerely,
Dave Van Riper
IPUMS

Kelsey_O_Hollaren · January 4, 2024, 3:11pm

Absolutely!

Here’s the “Data Summary” info as listed in the codebook:

Time series layout: Time varies by column
Geographic level: Census Tract (by State–County)
Geographic integration: Nominal
Measurement times: 2000, 2006-2010, 2007-2011, 2008-2012, 2009-2013, 2010-2014, 2011-2015, 2012-2016, 2013-2017, 2014-2018, 2015-2019, 2016-2020, 2017-2021, 2018-2022

!WARNING! In a “Time varies by column” layout, each row provides statistics
** from multiple censuses for areas that had a matching code across**
** time. For the Census Tract geographic level, matching codes may**
** refer to distinctly different areas in different censuses. We**
** strongly recommend checking GIS files to determine the geographic**
** consistency of your areas of interest for your period of interest.**

Aside for 2000, each is the five-year ACS estimate. I also looked how the repeats were distributed. Out of the total 420 duplicates, 234 are in Connecticut, 76 are in New Mexico, and 110 are in Puerto Rico. Perhaps there’s some clue in that?

David_Van_Riper · January 4, 2024, 3:34pm

Dear Kelsey,

I just downloaded a nominally integrated census tract dataset (total population table) for all the same years you did. I grouped the dataset by NHGISCODE and generated a count (to see what duplicates I could find). I didn’t actually find any duplicates when I did that analysis.

Can you provide a bit more detail about how you identified the duplicates (e.g,. what fields in your extract you used to identify the duplicates)?

Sincerely,
Dave

Kelsey_O_Hollaren · January 4, 2024, 3:59pm

Hi Dave,

Sure thing:

I’m using the following variables in my analysis:

AV0AA - Total Population
CV4AA - Persons by Hispanic or Latino Origin [2] by Race (White non-Hispanic)
B69AC - Persons 25 Years and Over by Educational Attainment w/ BA or higher
B79AA - Median Household Income in Previous Year
CL66A - Person’s below Poverty Level in Previous Year

Below I’m also including an example of how the data look in Stata for . While not shown in this picture, not all observations have missing values for years between 2000 and the five-year 2022 estimates.

David_Van_Riper · January 4, 2024, 5:07pm

Dear Kelsey,

We just tracked the problem down. Our fixed width files have incorrect county FIPS codes in CT, and character encoding in New Mexico and Puerto Rico (county names with accents or tildes) seem to be causing issues when loading fixed width files into Stata. A side effect of this is to put a bad value in CV4AA2000, which then impacts the rest of the data for those records. It looks as if it’s an issue when Stata is importing the .dat file using the widths in the .do file.

We’re going to look into this further, but if you re-submit your extract as a CSV file, you’ll get a dataset that can be read in using import delimited in Stata. It will read the variables in correctly and eliminate the duplicates that are introduced when reading in a fixed width file using the .do file.

Yours,
Dave

Kelsey_O_Hollaren · January 4, 2024, 5:21pm

Quick work!

thanks, Dave. I’ll redownload as csv. That should be sufficient for this project; however, would you mind letting me know when the issue with the fixed width do file program is fixed? just for future reference.

Thanks!
Kelsey

David_Van_Riper · January 4, 2024, 5:31pm

Yes, I will update this thread when we have the fixed width do file corrected.

Topic		Replies	Views
Title: Discrepancy in County Codes and Duplicates between API and Web Downloads of 2008-2012 ACS Tract-Level Data NHGIS	4	25	October 31, 2024
Help with 1960 census tract fips - best unique identifiers? NHGIS	1	720	October 9, 2018
Missing tracts from tract-level GIS files NHGIS	1	362	July 6, 2021
Verify correct number of tracts for Suffolk County, MA in 1980 NHGIS	4	283	April 20, 2022
Census Tract with 0 or missing population NHGIS	2	93	December 7, 2024

Repeated Census Tracts *Within* County

Related topics

Repeated Census Tracts Within County