Matching Time Series Data and GIS Files

Hi! Connie from Harvard. I’m working with NHGIS data from 1979 to 2020 for longitudinal comparison. I’ve been trying to map Time Series Tables using GIS Files, and I have some questions about the historical data:

In the GIS Files, the basis for the 1970, 1980, and 1990 data is either “2000 TIGER/Line” or “2008 TIGER/Line.” Meanwhile, in the Time Series Tables, we can choose between “Nominal” or “Standardized to 2010.” I’m not sure which options to choose for mapping.

Here’s my understanding:

  1. If I select “Nominal” in the Time Series Tables and use “2000 TIGER/Line” or “2008 TIGER/Line” in the GIS Files, this means the “2000 TIGER/Line” or “2008 TIGER/Line” shapefiles approximate the historical boundaries. In this case, I would need to download the “2000 TIGER/Line” or “2008 TIGER/Line” shapefiles corresponding to each year (1970, 1980, 1990), as the boundaries differ for these years due to the approximation. Is that correct?
  2. If I select ‘Standardized to 2010’ in the Time Series Tables for data between 1990-2010, does this imply that all historical data has been adjusted to align with the 2010 TIGER/Line boundaries, ignoring any boundary changes over time, and that the historical data within each GIS unit may not match with the GIS unit but is simply filled in? If so, would it be acceptable to download just one ‘2010 TIGER/Line’ shapefile from the GIS Files, assuming all historical data uses the same GIS boundaries?

Could you confirm if my understanding of these two points is correct? Additionally, what are your suggestions for choosing between ‘Nominal,’ ‘Standardized to 2010,’ ‘2000 TIGER/Line,’ or ‘2008 TIGER/Line’ for longitudinal analysis from 1970-2020?

I think your general understanding is correct, but I’ll do my best to explain the relevant concepts to make sure…

First, I’ll explain the “basis” of GIS files. Then I’ll explain how the GIS files correspond to the two types of time series tables.

The “basis” of the GIS files doesn’t describe the year when the boundaries were in use. It describes which release of the Census Bureau’s TIGER/Line files NHGIS has used as the geometric basis for the boundaries.

The TIGER/Line files include geometric definitions of many features, e.g., county lines, streets, rivers, coastlines, etc. The Census Bureau regularly releases new versions of TIGER/Line files, and each new version includes various accuracy improvements in the representations of features. As such, the geometry for a single feature (a street, a river, a county line) often differs from one TIGER/Line release to another as the representations are improved.

When two NHGIS GIS files differ only in their basis, that indicates that they provide different geometric representations for the same features. For example, NHGIS provides four shapefiles representing the boundaries of 2000 census tracts, each with a different TIGER/Line basis:

The “YEAR” column indicates that all four of these shapefiles represent 2000 census tracts. You could use any of them to map and analyze data for 2000 census tracts.

In the file with a 2000 TIGER/Line+ basis, the tract boundary geometry corresponds to features (streets, rivers, county lines, etc.) as they were represented in the 2000 TIGER/Line file. The file with a 2008 TIGER/Line+ basis also represents 2000 census tracts, but the boundary geometry corresponds to features (streets, rivers, county lines, etc.) as represented in the 2008 TIGER/Line file.

Now, about time series tables…

You can learn more about the two types of geographic integration NHGIS uses in time series tables through links in the NHGIS Data Finder (click on the “Nominal” or “Standardized to 2010” links) or in the Geographic Integration section of the Time Series Tables.

Importantly, the nominally integrated tables don’t report data for static geographic units across time:

Nominally integrated tables link geographic units across time according to their names and codes, disregarding any changes in unit boundaries. The identified geographic units match those from each census source, so the spatial definitions and total number of units may vary from one time to another (e.g., a city may annex land, a tract may be split in two, a new county may be created, etc.). The tables include data for a particular geographic unit only at times when the unit’s name or code was in use, resulting in truncated time series for some areas.

This means that 1970 data in a nominally integrated table correspond to 1970 census boundaries, and 2020 data correspond to 2020 census boundaries, etc.

To map the 1970 data in a nominally integrated table, you need to get a GIS file for 1970 geographic units (1970 counties or 1970 tracts, etc.). You could use the version with a 2000 TIGER/Line+ basis or a 2008 TIGER/Line+ basis. Regardless of the basis you choose, to map the 2000 data in a nominally integrated table, you’d still need to get a different GIS file, one that represents 2000 census geographic units.

Only the time series tables that are “Standardized to 2010” provide estimates for static geographic units across time. To map data in these tables, you need to get a GIS file that represents 2010 census units. NHGIS has GIS files for 2010 census units with either a 2010 TIGER/Line+ basis or a 2020 TIGER/Line+ basis. You could choose either of these.

1 Like

Hi Jonathan,

Thank you for your reply; it was very helpful! I still have some follow-up questions. I’ll use housing data as an example:

  1. If I choose nominally integrated data, such as 1970 housing data, and use 1970 GIS data with a basis of 2000, we are retaining the original 1970 housing data, while the 1970 GIS data has been approximated based on the 2000 TIGER/Line to align with the historical 1970 situation. However, if I choose “standardized to 2010” housing data and use the 2010 TIGER/Line for mapping, the 1970 housing data has been interpolated to align with the 2010 situation. In this case, the former requires GIS data processing, while the latter involves housing data processing. Is that correct?

  2. Are the data (Decennial Years and Non-Decennial Years) one-year data? Are the data under ‘Source Tables’ and ‘Time Series Tables’ both one-year data (except for ACS 5-year or ACS 3-year data)? I’m confused about whether the Time Series Tables and the Nominal data type are one-year data. If so, what processing did NHGIS do to make them ‘over time’?

  • E.g. 1: NHGIS defines ‘Time Series Tables’ as: ‘A table is comprised of one or more related time series, each of which describes a single summary statistic…’ (https://www.nhgis.org/time-series-tables). What does ‘one or more related time series’ mean in this context?

  • E.g. 2: How do we interpret ‘Nominally integrated tables link geographic units across time according to their names and codes’?"

  1. If “Source Tables” and “Time Series Tables” are both one-year data, what are the key differences between them? I see that Source Tables usually have more data about different variables.

  2. Could you please share information about when census data has been sample-based versus a full census since 1790, particularly after 1960? I’d like to understand the overall situation for all data (if they differ) and, specifically, how this applies to housing data.

  1. If I choose nominally integrated data, such as 1970 housing data, and use 1970 GIS data with a basis of 2000, we are retaining the original 1970 housing data, while the 1970 GIS data has been approximated based on the 2000 TIGER/Line to align with the historical 1970 situation. However, if I choose “standardized to 2010” housing data and use the 2010 TIGER/Line for mapping, the 1970 housing data has been interpolated to align with the 2010 situation. In this case, the former requires GIS data processing, while the latter involves housing data processing. Is that correct?

Your first two sentences are mostly correct. One caveat: the NHGIS tables that are “standardized to 2010” don’t yet include any 1970 data; they include only 1990-2020 data.

Also note, in case it’s not clear: your selections for time series tables and for GIS files are independent choices for independent files. NHGIS doesn’t adjust the data in your selected tables to align with boundaries in your selected GIS files. If you select 1970 data in a nominally integrated time series table, you’ll get the same table data regardless of the year or basis of the GIS files you choose to get.

I’m not sure how to interpret your third statement, that “the former requires GIS data processing, while the latter involves housing data processing.” Are you describing the processing that you think NHGIS uses to construct the data? Or are you describing the processing that a data user must apply to the files that they get from NHGIS? NHGIS uses no GIS data processing to construct nominally integrated time series tables, but we do use GIS data processing to construct the geographic crosswalks that we use to generate standardized time series tables.

  1. Are the data (Decennial Years and Non-Decennial Years) one-year data? Are the data under ‘Source Tables’ and ‘Time Series Tables’ both one-year data (except for ACS 5-year or ACS 3-year data)? I’m confused about whether the Time Series Tables and the Nominal data type are one-year data. If so, what processing did NHGIS do to make them ‘over time’?

Time series tables link together data from multiple times. When you get time series tables through the NHGIS Data Finder, you may choose which years you will download. If a time series table includes data from 2000, 2010, and 2020, you may choose to get data for one, two or three of those years. If you choose to get only one year, then the output data will be “one-year data”. If you choose to get two or three of the years, then the output data file(s) will contain data for multiple years.

You can also choose one of three layouts when you request time series tables. If you choose “Time Varies by File”, each file will contain data for only one year. But if you choose “Time Varies by Column” or “Time Varies by Row”, then each file will contain data for all of your selected years.

E.g. 1: NHGIS defines ‘Time Series Tables’ as: ‘A table is comprised of one or more related time series, each of which describes a single summary statistic…’ (Time Series Tables | IPUMS NHGIS). What does ‘one or more related time series’ mean in this context?

As explained in this example, a Persons by Sex time series table includes two time series: one gives counts of males over time, and one gives counts of females over time. I recommend that you try selecting and downloading a table, making sure that you select multiple years with your data request, and then when you see the output files, I hope it will be clearer. It might also help if you try getting two or three different layouts, as described above.

E.g. 2: How do we interpret ‘Nominally integrated tables link geographic units across time according to their names and codes’?"

For example, the state of West Virginia split off from the state of Virginia between 1860 and 1870. If you get a state-level nominally integrated table for the years 1860 and 1870, there will be data for “Virginia” in both years, but there will data for “West Virginia” only in 1870, not 1860. The 1860 data for Virginia includes the characteristics of the area that became West Virginia. The table is “nominally integrated” because it links together the 1860 and 1870 data for Virginia according to the state name (“Virginia”) even though that name represented a different area at each time.

  1. If “Source Tables” and “Time Series Tables” are both one-year data, what are the key differences between them? I see that Source Tables usually have more data about different variables.

Source tables contain data from a census source with minimal alteration by NHGIS. Time series tables link data across time as described above. NHGIS researchers standardized the categories in time series tables through “attribute integration”, and we linked geographic areas across time, either by nominal integration or by geographic standardization. There are many fewer subjects and geographic levels in time series tables than in source tables because of the substantial effort required for us to integrate data from source tables into time series tables.

  1. Could you please share information about when census data has been sample-based versus a full census since 1790, particularly after 1960? I’d like to understand the overall situation for all data (if they differ) and, specifically, how this applies to housing data.

We provide information about the sources of NHGIS tables, including some information about which are full-count vs. sample-based, through our Overview of Datasets page. You can find out more through NHGIS source documentation on the Tabular Data Sources page. It might also help to review our FAQs about the ACS, which includes some explanation of how the ACS relates to earlier censuses. You may find other useful resources through other sites.

If you have more questions about any of this, I recommend you consider viewing one or two of the webinars available through the User Guide page. One webinar gives an overview of NHGIS and another introduces time series tables.

1 Like

Those are very helpful! Thank you, Jonathan!