Hello,
I’m doing a longitudinal analysis from 1970 to 2016 with compositional variables, such as percent Hispanic at the county level. For the Hispanic indicator, the time series table recommends NT24 for 1970, which includes the following four categories:
C11001: Any of five Spanish categories of the question on “origin or descent”
C11002: Puerto-Rican birth or parentage
C11003: Spanish language
C11004: Not of “Spanish language” but of Spanish surname (in 5 Southwestern states only)
Since I am trying to calculate percent Hispanic for each county, is it correct to sum C11001 through C1104 as the numerator and to use the total population count from NT126 as the denominator? Or, would you only use C11001 and C11002 in the numerator?
Thanks!
The counts in NT24 are not mutually exclusive, so summing them would result in double-counting persons who fall into multiple Spanish Origin categories. The NHGIS time series table uses only the C11001 count, and that’s what I’d recommend generally, too. As explained in the time series table notes:
For the [1970] 5% sample, there was also a question allowing respondents to self-identify an “origin or descent” of Mexican, Puerto Rican, Cuban, Central or South American, or “Other Spanish”, which closely resembles the standard question used in later censuses.
For “Hispanic or Latino” time series in this table and elsewhere, NHGIS uses 1970 counts based on the 5% sample’s self-identification question for optimal comparability with later censuses.
This makes deriving a suitable “total population” denominator for computing percent Hispanic a little more complicated. Counts from the 5% sample won’t add up exactly to total populations from the 100% count (as given in NT126). Instead, to get an appropriate base population for the 5% sample, you need to sum all counts from a table that’s derived from the 5% sample and that has a total population universe. Based on the source documentation for the 1970 Count files, one table that meets these criteria is NT25. Citizenship by Age. Bottom line: for a suitable denominator, I recommend you sum all the counts from NT25.
(Coincidentally, I was just looking into this issue yesterday, so your question is well timed!)
Thank you very much for the helpful response! I would not have figured that out on my own.
I have the same question for using the correct denominator in percent Hispanic for 1980 and 1990 as well. It is not possible to sum the demoninator for Hispanic percent from the Hispanic indicator tables themselves (Tables NT9A and NP8). What tables would be the appropriate source for the denominator in calculating percent Hispanic at the county level for 1980 and 1990? This is not a problem from 2000 onward, where we can clearly calculate a denominator from the Hispanic indicator tables themselves.
Thanks!
Those 1980 and 1990 tables are from STF1 datasets, which are based on the 100% count of the population (not a sample, like the 1970 Spanish Origin counts). Therefore, you can use any 100% count of the total population as a denominator for them, from STF1 or elsewhere. (What’s important is that your numerator and denominator are both based on the same surveyed population.) You could also get compatible Hispanic and Total Population counts for all years from 1980 to 2010 from the corresponding time series tables.