I am working with historical tract data from the 1960s to the present. I have noticed that the number of characters in the GISJOIN field varies (either 12 or 14 characters). My understanding is that the GISJOIN field is derived from FIPS codes, and that each FIPS code consists of 11 characters (2 state, 3 county, 6 tract). It appears that in the historical data files some of the tract codes only contain four characters, while others contain six. Is this due to variation in the way FIPS codes were assigned historically or to some other reason?
Your interpretation is correct. In NHGIS data for 1970, 1980 & 1990, the tract codes may have either 4 or 6 digits, which is consistent with how the Bureau reported the codes in our source files from those years. In 1960 and earlier data, tract codes may also include a prefix or suffix, which our data files provide in separate columns, as well as including them in the full GISJOIN. The combined 1960 tract codes range from 4 to 8 characters in length and may include letters as well as numbers.
Technically, the tract codes are not “FIPS codes.” The “Federal Information Processing Standard” (FIPS) applies only to “states and statistically equivalent entities, counties and statistically equivalent entities, named populated and related location entities (such as, places and county subdivisions), and American Indian and Alaska Native areas.” The tract codes have instead been defined directly by the Census Bureau or their local partners.
The GISJOIN IDs for tracts include the tract codes as well as NHGIS state and county codes, which are based on FIPS codes, but with an extra digit added in order to distinguish historical entities. For entities that existed in recent decades, the added digit is 0. For historical areas, we use a nonzero digit. E.g., the state of Minnesota has FIPS code 27 and NHGIS code 270. Minnesota Territory, which appears in 1850 census data, predates the FIPS system. NHGIS assigns it a code of 275.