Matching Time Series Data and GIS Files

kanglin_chen · November 21, 2024, 9:51pm

Hi! Connie from Harvard. I’m working with NHGIS data from 1979 to 2020 for longitudinal comparison. I’ve been trying to map Time Series Tables using GIS Files, and I have some questions about the historical data:

In the GIS Files, the basis for the 1970, 1980, and 1990 data is either “2000 TIGER/Line” or “2008 TIGER/Line.” Meanwhile, in the Time Series Tables, we can choose between “Nominal” or “Standardized to 2010.” I’m not sure which options to choose for mapping.

Here’s my understanding:

If I select “Nominal” in the Time Series Tables and use “2000 TIGER/Line” or “2008 TIGER/Line” in the GIS Files, this means the “2000 TIGER/Line” or “2008 TIGER/Line” shapefiles approximate the historical boundaries. In this case, I would need to download the “2000 TIGER/Line” or “2008 TIGER/Line” shapefiles corresponding to each year (1970, 1980, 1990), as the boundaries differ for these years due to the approximation. Is that correct?
If I select ‘Standardized to 2010’ in the Time Series Tables for data between 1990-2010, does this imply that all historical data has been adjusted to align with the 2010 TIGER/Line boundaries, ignoring any boundary changes over time, and that the historical data within each GIS unit may not match with the GIS unit but is simply filled in? If so, would it be acceptable to download just one ‘2010 TIGER/Line’ shapefile from the GIS Files, assuming all historical data uses the same GIS boundaries?

Could you confirm if my understanding of these two points is correct? Additionally, what are your suggestions for choosing between ‘Nominal,’ ‘Standardized to 2010,’ ‘2000 TIGER/Line,’ or ‘2008 TIGER/Line’ for longitudinal analysis from 1970-2020?

JonathanSchroeder · November 21, 2024, 11:20pm

I think your general understanding is correct, but I’ll do my best to explain the relevant concepts to make sure…

First, I’ll explain the “basis” of GIS files. Then I’ll explain how the GIS files correspond to the two types of time series tables.

The “basis” of the GIS files doesn’t describe the year when the boundaries were in use. It describes which release of the Census Bureau’s TIGER/Line files NHGIS has used as the geometric basis for the boundaries.

The TIGER/Line files include geometric definitions of many features, e.g., county lines, streets, rivers, coastlines, etc. The Census Bureau regularly releases new versions of TIGER/Line files, and each new version includes various accuracy improvements in the representations of features. As such, the geometry for a single feature (a street, a river, a county line) often differs from one TIGER/Line release to another as the representations are improved.

When two NHGIS GIS files differ only in their basis, that indicates that they provide different geometric representations for the same features. For example, NHGIS provides four shapefiles representing the boundaries of 2000 census tracts, each with a different TIGER/Line basis:

The “YEAR” column indicates that all four of these shapefiles represent 2000 census tracts. You could use any of them to map and analyze data for 2000 census tracts.

In the file with a 2000 TIGER/Line+ basis, the tract boundary geometry corresponds to features (streets, rivers, county lines, etc.) as they were represented in the 2000 TIGER/Line file. The file with a 2008 TIGER/Line+ basis also represents 2000 census tracts, but the boundary geometry corresponds to features (streets, rivers, county lines, etc.) as represented in the 2008 TIGER/Line file.

Now, about time series tables…

You can learn more about the two types of geographic integration NHGIS uses in time series tables through links in the NHGIS Data Finder (click on the “Nominal” or “Standardized to 2010” links) or in the Geographic Integration section of the Time Series Tables.

Importantly, the nominally integrated tables don’t report data for static geographic units across time:

Nominally integrated tables link geographic units across time according to their names and codes, disregarding any changes in unit boundaries. The identified geographic units match those from each census source, so the spatial definitions and total number of units may vary from one time to another (e.g., a city may annex land, a tract may be split in two, a new county may be created, etc.). The tables include data for a particular geographic unit only at times when the unit’s name or code was in use, resulting in truncated time series for some areas.

This means that 1970 data in a nominally integrated table correspond to 1970 census boundaries, and 2020 data correspond to 2020 census boundaries, etc.

To map the 1970 data in a nominally integrated table, you need to get a GIS file for 1970 geographic units (1970 counties or 1970 tracts, etc.). You could use the version with a 2000 TIGER/Line+ basis or a 2008 TIGER/Line+ basis. Regardless of the basis you choose, to map the 2000 data in a nominally integrated table, you’d still need to get a different GIS file, one that represents 2000 census geographic units.

Only the time series tables that are “Standardized to 2010” provide estimates for static geographic units across time. To map data in these tables, you need to get a GIS file that represents 2010 census units. NHGIS has GIS files for 2010 census units with either a 2010 TIGER/Line+ basis or a 2020 TIGER/Line+ basis. You could choose either of these.

kanglin_chen · November 22, 2024, 10:47pm

Hi Jonathan,

Thank you for your reply; it was very helpful! I still have some follow-up questions. I’ll use housing data as an example:

If I choose nominally integrated data, such as 1970 housing data, and use 1970 GIS data with a basis of 2000, we are retaining the original 1970 housing data, while the 1970 GIS data has been approximated based on the 2000 TIGER/Line to align with the historical 1970 situation. However, if I choose “standardized to 2010” housing data and use the 2010 TIGER/Line for mapping, the 1970 housing data has been interpolated to align with the 2010 situation. In this case, the former requires GIS data processing, while the latter involves housing data processing. Is that correct?
Are the data (Decennial Years and Non-Decennial Years) one-year data? Are the data under ‘Source Tables’ and ‘Time Series Tables’ both one-year data (except for ACS 5-year or ACS 3-year data)? I’m confused about whether the Time Series Tables and the Nominal data type are one-year data. If so, what processing did NHGIS do to make them ‘over time’?

E.g. 1: NHGIS defines ‘Time Series Tables’ as: ‘A table is comprised of one or more related time series, each of which describes a single summary statistic…’ (https://www.nhgis.org/time-series-tables). What does ‘one or more related time series’ mean in this context?
E.g. 2: How do we interpret ‘Nominally integrated tables link geographic units across time according to their names and codes’?"

If “Source Tables” and “Time Series Tables” are both one-year data, what are the key differences between them? I see that Source Tables usually have more data about different variables.
Could you please share information about when census data has been sample-based versus a full census since 1790, particularly after 1960? I’d like to understand the overall situation for all data (if they differ) and, specifically, how this applies to housing data.

JonathanSchroeder · November 26, 2024, 2:59am

If I choose nominally integrated data, such as 1970 housing data, and use 1970 GIS data with a basis of 2000, we are retaining the original 1970 housing data, while the 1970 GIS data has been approximated based on the 2000 TIGER/Line to align with the historical 1970 situation. However, if I choose “standardized to 2010” housing data and use the 2010 TIGER/Line for mapping, the 1970 housing data has been interpolated to align with the 2010 situation. In this case, the former requires GIS data processing, while the latter involves housing data processing. Is that correct?

Your first two sentences are mostly correct. One caveat: the NHGIS tables that are “standardized to 2010” don’t yet include any 1970 data; they include only 1990-2020 data.

Also note, in case it’s not clear: your selections for time series tables and for GIS files are independent choices for independent files. NHGIS doesn’t adjust the data in your selected tables to align with boundaries in your selected GIS files. If you select 1970 data in a nominally integrated time series table, you’ll get the same table data regardless of the year or basis of the GIS files you choose to get.

I’m not sure how to interpret your third statement, that “the former requires GIS data processing, while the latter involves housing data processing.” Are you describing the processing that you think NHGIS uses to construct the data? Or are you describing the processing that a data user must apply to the files that they get from NHGIS? NHGIS uses no GIS data processing to construct nominally integrated time series tables, but we do use GIS data processing to construct the geographic crosswalks that we use to generate standardized time series tables.

Are the data (Decennial Years and Non-Decennial Years) one-year data? Are the data under ‘Source Tables’ and ‘Time Series Tables’ both one-year data (except for ACS 5-year or ACS 3-year data)? I’m confused about whether the Time Series Tables and the Nominal data type are one-year data. If so, what processing did NHGIS do to make them ‘over time’?

Time series tables link together data from multiple times. When you get time series tables through the NHGIS Data Finder, you may choose which years you will download. If a time series table includes data from 2000, 2010, and 2020, you may choose to get data for one, two or three of those years. If you choose to get only one year, then the output data will be “one-year data”. If you choose to get two or three of the years, then the output data file(s) will contain data for multiple years.

You can also choose one of three layouts when you request time series tables. If you choose “Time Varies by File”, each file will contain data for only one year. But if you choose “Time Varies by Column” or “Time Varies by Row”, then each file will contain data for all of your selected years.

E.g. 1: NHGIS defines ‘Time Series Tables’ as: ‘A table is comprised of one or more related time series, each of which describes a single summary statistic…’ (Time Series Tables | IPUMS NHGIS). What does ‘one or more related time series’ mean in this context?

As explained in this example, a Persons by Sex time series table includes two time series: one gives counts of males over time, and one gives counts of females over time. I recommend that you try selecting and downloading a table, making sure that you select multiple years with your data request, and then when you see the output files, I hope it will be clearer. It might also help if you try getting two or three different layouts, as described above.

E.g. 2: How do we interpret ‘Nominally integrated tables link geographic units across time according to their names and codes’?"

For example, the state of West Virginia split off from the state of Virginia between 1860 and 1870. If you get a state-level nominally integrated table for the years 1860 and 1870, there will be data for “Virginia” in both years, but there will data for “West Virginia” only in 1870, not 1860. The 1860 data for Virginia includes the characteristics of the area that became West Virginia. The table is “nominally integrated” because it links together the 1860 and 1870 data for Virginia according to the state name (“Virginia”) even though that name represented a different area at each time.

If “Source Tables” and “Time Series Tables” are both one-year data, what are the key differences between them? I see that Source Tables usually have more data about different variables.

Source tables contain data from a census source with minimal alteration by NHGIS. Time series tables link data across time as described above. NHGIS researchers standardized the categories in time series tables through “attribute integration”, and we linked geographic areas across time, either by nominal integration or by geographic standardization. There are many fewer subjects and geographic levels in time series tables than in source tables because of the substantial effort required for us to integrate data from source tables into time series tables.

Could you please share information about when census data has been sample-based versus a full census since 1790, particularly after 1960? I’d like to understand the overall situation for all data (if they differ) and, specifically, how this applies to housing data.

We provide information about the sources of NHGIS tables, including some information about which are full-count vs. sample-based, through our Overview of Datasets page. You can find out more through NHGIS source documentation on the Tabular Data Sources page. It might also help to review our FAQs about the ACS, which includes some explanation of how the ACS relates to earlier censuses. You may find other useful resources through other sites.

If you have more questions about any of this, I recommend you consider viewing one or two of the webinars available through the User Guide page. One webinar gives an overview of NHGIS and another introduces time series tables.

kanglin_chen · November 26, 2024, 2:17pm

Those are very helpful! Thank you, Jonathan!

kanglin_chen · December 28, 2024, 7:05pm

Hi Jonathan, I have a follow-up question! My research team obtained Total Housing Units and Tenure and Mortgage Status data (time series tables) since 1970 from NHGIS and calculated the rates for owner-occupied housing units. Please find the screenshot below how we obtained the total number of housing units and total number of owner-occupied housing unit data.

However, I noticed that the results differ from the Census reports and data. For example, you can check the “United States and regions” file which includes the homeownership rates provided by Census here. The differences are quite large—e.g., the NHGIS national homeownership rate we calculated is 57.9%, while the Census national data is 64.2%. I noticed that the Census data is closer to the median of county-level homeownership data for the country provided by NHGIS (though not exactly the same).

I also noticed that different Census reports have slightly different rates for the same year, such as in the “Homeownership Rates” file provided here. Census does not clarify how they obtained the national data though in the reports.

I was wondering if you could explain the differences. Is it possible that the Census may use summary statistics of certain administrative levels (e.g., the average or median of region-level, state-level, or county-level statistics) to derive national data instead of calculating it directly from the total number of owner-occupied housing units and total housing units in the U.S.? Alternatively, could the Census be using the total number of occupied housing units (excluding vacant housing units) as the denominator? Could I trust the results we calculated based on the IPUMS/NHGIS data?

Also, I selected “Nation” for national-level data calculations. I assume the “Nation” data includes territories (e.g., Hawaii, Alaska, Puerto Rico, etc.) beyond the contiguous U.S. after these areas became U.S. territories. Is that correct?

JonathanSchroeder · December 30, 2024, 6:18pm

Homeownership rates are typically computed using a denominator of occupied housing units (or of households or householders, which are both numerically equivalent to occupied housing units). Using that standard with NHGIS time series tables, I compute a 1970 rate of 62.9%. That’s closer to–though still not equal to–the 64.2% in the report you shared. I expect the main reason for the difference is that the report uses a different source, the Current Population Survey/Housing Vacancy Survey (CPS/HVS). NHGIS time series tables use data from the decennial Census of Population and Housing or from the American Community Survey (ACS). You can find an explanation of differences between the CPS/HVS and the ACS through the Census Bureau’s CPS/HVS Methodology page. I would expect similar differences to occur between the CPS/HVS and decennial census data. I believe both sources to be valid, but there will still be differences due to the different collection methods.

The “Nation” level in U.S. Census data includes all states and the District of Columbia. It omits Puerto Rico and other Island Areas.

kanglin_chen · February 16, 2025, 3:17am

Thank you Jonathan, that is very helpful!

Topic		Replies	Views
How are census tracts standardized across years? NHGIS	4	660	June 9, 2022
Joining longitudinal census data with spatial data NHGIS	2	301	May 23, 2023
Basic questions about digital block geometry for 1980 to 2000 for US territories NHGIS	1	353	February 26, 2021
Time Series with 2020 Geographic Integration NHGIS	7	486	March 5, 2024
Historical roads data NHGIS	6	397	April 9, 2025

Matching Time Series Data and GIS Files

Related topics