Why are there nodata values for many 1920 census tracts?



I’m a data librarian helping a graduate student who is working with tract-level summary data from NHGIS from the 1920 census for Kings County (Brooklyn) New York. The tract shapefile from 1920 contains 965 tracts for the county. Of these tracts, 357 are coded with a nodata value in the GISJOIN field, while the other 608 have regular identifiers. When I look at a data table like 1920_tpop_nyc it has records for 585 tracts. Essentially there are a large number of tracts for which no data exists.

My question is - if tract boundaries were delineated for the entire county, why isn’t data tabulated for all of them?

I know that the Census Bureau didn’t introduce tracts as an official geography until 1940. From 1910 to 1930 a number of social scientists drew tract boundaries and tabulated data for them on an unofficial basis for a handful of cities. My assumption is that they simply didn’t finish tabulating data for all these tracts, or the data was lost. I am hoping someone could confirm my assumption, and ideally point me to some documentation that either supports this or provides another explanation.



The tract data for 1920 New York City came from Statistical Sources for Demographic Studies of Greater New York, 1920. We based our census tract polygons on the maps in the volume. We made a polygon for each feature on the map. We then merged the population data, which was also derived from that volume, with the polygons. Any polygon that had no match in the population data received a code of “nodata” in its GISJOIN. We assumed that those tracts had no persons living in them.

After closer examination of the tract-level tabulations in that volume, we have since realized that the “nodata” tracts actually contained persons. The print volume shows that the Bureau sometimes combined data for multiple tracts. When we typed in the data, we only coded it for the first tract in the listing. An example will illustrate this issue:

NHGIS has a Kings County census tract coded as 305. It has a population total of 11,355 persons. Census tract 305 is adjacent to a “nodata” tract. That “nodata” tract was originally coded as census tract 307. We changed its code to “nodata” when it had no match in the population data.

If you look at the “Statistical source” volume (page 835), you can see an entry called 305+307. When they published the totals for tract 305, they combined the populations of 305 and 307 and assigned it to 305.

When we typed in the data, we called that entry “tract 305” and didn’t capture the “307” part of it.


Thus, the population data for tract 305 includes those living in tract 307, but the polygon for tract 305 only includes the area covered by tract 305. Tract 307 is assigned a “nodata” value.

We are aware of this issue and plan to adjust the 1920 tract boundaries in New York City to fix the errors. We don’t have a timeline for that work yet, however.

If the graduate student would like to tackle this work, I recommend downloading the 1920 census tracts conflated to the 2008 TIGER/Line files. Those tract boundaries have the original census tract IDs on them (e.g., the IDs before we set them to “nodata”). Combined with the “Statistical sources” volume, they could merge the appropriate tracts together (e.g., merge tracts 305 and 307 together to create the correct tract 305 footprint) to match the published data.


Great! Thanks very much for this detailed response. I’ve looked at the document and see what you mean. I’ll pass the info and this solution along to the student.

1 Like