My goal is to acquire decennial census data on a number of population attributes through time 1870-2020 for WA and OR state and county levels. Attributes include race, nativity, parent’s nativity, age distribution, sex distribution, literacy levels, urban / rural residency etc. Not all questions were asked through time and response options have evolved. I realize I won’t get perfect longitudinal “straight lines” of data but I’ll take what I can get and spend my time merging/collapsing/parsing to create stable longitudinal categories. This will be hard enough with the natural “drift” in available data. But the multiplicity of tables on the same subject is daunting.
I have downloaded a boat load of what I thought where equivalent tables, but both in this data set and Social Explorer, I am having the hardest time finding the right comparable tables year over year. I’m looking for the Rosetta stone, that helps me equate Sex Table nnn in 1900 is equivalent to Sex Table mmm in 1930, etc. – same level of granularity and ideally even the table column IDs line up.
Does this exist in whole or part?
Thank you in advance. DP
IPUMS USA has integrated variables that allow you to construct tables across censuses using consistent codes. A couple caveats:
IPUMS USA provides microdata, so to summarize for states and counties, you need to sum microdata records by state or county, applying weights to construct whole-population estimates from each sample. For 1940 and earlier years, you could also use IPUMS USA’s full-count census microdata, which does not require weighting.
For years after 1940, IPUMS USA can currently only provide “public use samples” of microdata, which restrict the degree of geographic detail, making it possible to identify only a subset of counties in WA & OR.
IPUMS NHGIS provides census summary tables for states and counties, like those you’ve already been working with. And yes, these vary a great deal from census to census. NHGIS has also constructed “time series tables” that match up consistently defined counts across time for a wide range of subjects, but most of these tables go back no farther than 1970. There are two time series tables that go back farther: Total Population back to 1790 and Persons by Sex back to 1820.
Beyond that, I don’t know of another source that has linked comparable summary data all the way back to 1870.
Thank you for this guidance. Permit me to ask one more newbie follow-up.
I really am only interested in summary tables and not technically sophisticated enough to aggregate and weight the person- or household-level data.
Here is my question that could help me navigate your rich data set:
In NHGIS, is there way to search for the associated data set based on the actual published (printed) census materials’ Section, Topic, Table names? For example If I see in the 1910 US Census for WA state, a table of Population by Race, with columns for “White, Colored, Chinese, Japanese” with title “Table N: ” on p. XX. Is there some way I can interrogate your data set to find that same data set based on the printed table title?
I see your downloaded data sets have alphanumeric identifiers, but I have not been able to crack the code on them within or across decennial censuses.
Thank you for your guidance.
At this time, there’s no way to search in the NHGIS web interface for specific column or table names. It’s possible to query NHGIS metadata using our API, and we’re presently working on improvements to the ipumsr package to simplify that type of search within R. (The ipumsr extensions may be available sometime in the coming months.) But I’m afraid those options may entail more “technical sophistication” than you’d like to engage with!
One resource that may be somewhat helpful for you is the documentation for our sources of the historical data. These list which specific published tables our data are derived from, but they don’t provide a mapping between those tables and ours.