Hispanic persons by race in 1970 tracts

I have pulled tract-level 1970 data on sex by age (NT17) with a race/ethnicity breakdown (all races, white, black, and Hispanic). From this set of variables, I am interested in generating mutually exclusive race/ethnicity categories (non-hispanic white, non-hispanic black, non-hispanic other, and hispanic of all races) for each age group. To do this, I am attempting to impute the racial breakdown of Hispanic people in a given age group for each tract in my study area. The best way I can think to do this is to impute the racial categories of Hispanics within a given age group using tract-level data on the portions of Hispanics of all ages that are identified with each racial group.

The only table that breaks down the racial identifications of Hispanics in the 1970 census at the tract level looks like NT24 (Spanish Indicator). However, as I learned from Jonathan’s answer to a past forum post (1970 Hispanic indicator?), the categories in NT24 are not mutually exclusive. Jonathan said that NHGIS time series tables generally use the C11001 category to identify Hispanics in a way that somewhat more consistent with other Censuses.

However, I’m noticing that there are tracts for which the C11001 count (people of all races self-identifying as Hispanic on the origin or descent question) in table NT24 is 0, but there are nonzero counts of Hispanics in specific age groups in the NT17 table (the C1UAC variables). I assume this means that the Hispanic variables in the NT17 table include a wider set of people than just those who are identified as Hispanic using the origin or descent question, and I confirmed that the tracts in question do have nonzero counts for other Spanish Origin indicators like C11002 (Puerto Rican birth or parentage) or C11003 (Spanish language). I am wondering what set of Spanish Origin indicators results in people being considered Hispanic in table NT17? Perhaps this is related to the way that the HISPAN variable is imputed for 1970 microdata? (1960 Census: Hispanic Status)

If the Hispanic breakdown for the NT17 table consists of all people for whom one or more Spanish Origin indicator applies, than I am wondering how I might go about getting tract-level data on the racial composition of those people. Table NT24 does not seem to have a variable that breaks down the racial identifications of all people for whom any Spanish Origin indicator applies - only the breakdowns for individual Spanish Origin indicators.

There are unfortunately multiple complicating issues here. First, there are the issues you raised pertaining to the different Spanish Indicators in the 1970 data. Second, another important issue is that the 1970 summary files suppress data for many records to protect confidentiality.

Regarding the first issue, in the post you linked to, I quoted some text from our time series table notes. In the full source note, there’s another sentence that’s relevant here. The note’s full first paragraph is:

5% Basis for 1970 Counts: Most 1970 census tables that distinguish “Spanish-American” persons do so based on information from the census’s 15% sample regarding language spoken, surnames, or Puerto Rican nativity, with the exact criteria varying by state, as specified in Section 64 of the 1970 Census Users’ Dictionary. For the separate 5% sample, there was also a question allowing respondents to self-identify an “origin or descent” of Mexican, Puerto Rican, Cuban, Central or South American, or “Other Spanish”, which closely resembles the standard question used in later censuses.

Here’s a snapshot of the referenced “Section 64” with some key text highlighted:

The tables you’re working with are from the “4th Count - Population” 1970 summary tape files. The source documentation for those tables indicate which tables use a 5% basis. The listing is on pages 346-348 in this PDF document. (I included a link to this document in the previous post, but I just discovered that that link was stale and broken. I’ve updated the link in the previous post now, too.)

These sources provide an answer to your first question, “what set of Spanish Origin indicators results in people being considered Hispanic in table NT17?” That table is not listed as using a 5% basis, so its “Spanish American” breakdown is based on Puerto Rican origin, Spanish language and/or Spanish surname, not the preferred question about Spanish origin from the 5% basis. And the text from Section 64 explains a key feature of how the NT17’s general Spanish American classification corresponds to counts from table NT24: the computation varies by state, as specified.

Note though that NT17 uses a 20% basis, which I think explains why I get slightly different values when I compute Spanish American totals from the Spanish American (AC) breakdown for NT17 than when I sum the corresponding totals from NT24 (C11002-004), which use a 15% basis .

Note also that there are a few tables that provide counts by Age and/or Sex using a 5% basis, but they’re all limited to a smaller set of age categories or a smaller universe (e.g., tables 25, 43, 50, 52). Still, conceivably, you might be able to use these tables to determine a little more about the association between race / Spanish Origin and sex and age for an imputation using the 5% basis.

Lastly, regarding suppression, be aware that some tracts will have no data available for certain tables for certain breakdowns. This is generally indicated by blank entries and/or a -1 value for the first variable in a table / breakdown for a given record. You might consider using additional imputation techniques to fill in the missing data.