Race and ethnicity in the 2010 decennial Census data


I have been trying to create a dataset of the Asian Indian-only population (only endorsed Asian Indian race/ethnicity and no others) in the U.S. using the 2010 decennial Census. I was able to find this information in the 2010_SF2a (2010 Summary File 2) dataset and using the race breakdown, I was able to generate and download a dataset from IPUMS.

However, I wanted to run some sanity checks, so I cross-referenced my total N with a Census report that overviews the US Asian Population using the 2010 Census (page 14, Table 5): https://www.census.gov/history/pdf/2010asian-122016.pdf

The N’s I had calculated were off from the report by ~1,000,000. My total N for Asian Indians across all census tracts was ~1.8 million, whereas the report linked above reported ~2.8 million. From what I can tell, I am using the same criteria for Asian Indian classification. I wanted to see if anyone might provide some information on why this might be the case? My instinct is that my IPUMS NHGIS request is somehow too restrictive but aside from the total US population, I haven’t been able to replicate any of the N’s. I also thought about whether or not Hispanic ethnicity was being included; my feeling is that even if the Census report includes those who endorsed Asian Indian race + any Hispanic ethnicity, the difference between the N’s shouldn’t be ~1 million. If anyone has any suggestions it would be very much appreciated. Thanks!


Dear Joey,

I created an extract from 2010 SF2a (from IPUMS NHGIS) that contained counts (2,843,391) for Asian Indians which match the counts reported in Table 5 of the publication you linked to. I selected the total population table for 2010 SF2a from the DATA FINDER. On the options page, I first selected the Nation geographic level. Second, I opened the Race/Ethnicity Breakdowns pop-up, and selected the following checkboxes in the popup:

Asian Indian alone (400-401)
Asian Indian alone or in any combination (400-401) & (100-299) or (300, A01-Z99) or (400-999)

The Asian Indian alone (400-401) count in the output CSV was 2,843,391, and the Asian Indian alone or in any combination count was 3,183,063. Both of these counts are in Table 5 of the publication you linked to.

If you can provide more information about your data (the name of the table(s), the geographic level you selected, and the options you chose in the Race/Ethnicity Breakdown popup), I can try to look into why your estimate differs from the publication.

Dave Van Riper

Hello Dave,

Thanks so much for your response! It’s very reassuring that you were able to replicate the N’s from the publication. I’d be happy to provide the additional information you requested:

Geographic level: Census tract (by state-county)
Race/Ethnicity Breakdown: Asian Indian Alone (400-401)

For additional context, we chose Census tract as we have census tract level exposures we were hoping to merge onto the dataset for additional analyses. I had calculated my total Asian Indian Alone population by adding together all of the census-specific n’s after applying the restrictions above. Please let me know if there’s any other information that I can provide that might be helpful!

Kind regards,


Dear Joey,

For the 2010 Summary File 2, the Census Bureau uses suppression when there are fewer than 100 people identifying as a specific detailed race/ethnicity group in a specific geographic unit. Thus, while there were a certain number of Asian Indians in the entire US, not all census tracts contained 100 Asian Indians. When there were fewer than 100 Asian Indians in a census tract, the Bureau suppressed the Asian Indian count for that tract. The specific text from the Bureau’s technical documentation is:

The presentation of SF 2 tables for any of the 331 population groups is subject to
a population threshold of 100 or more people. That is, if there are fewer than 100 people in a specific
population group in a specific geographic area, their population and housing characteristics data are not available for that geographic area in SF 2

When you sum the Asian Indian count for census tracts, the nationwide total will be reduced because of suppression.

Essentially, the census tracts you have data for are those with more than 100 Asian Indians, but you don’t have data for all Asian Indians in the US.


Hello Dave,

Thank you for your response; this makes a lot of sense! For our analyses, we are especially interested in these smaller pockets of Asian Indian folks; so we would like to include them if at all possible. However, it seems like this characteristic is something that is baked into data. If I have to use census tracts as my geographic unit, do you have any recommendations or suggestions for how I might be able to get around the population suppression? If not, no worries - this has all been extremely helpful in understanding the data available! Thanks!



Dear Joey,

The Census Bureau’s suppression rules were implemented to protect the privacy and confidentiality of respondents, and there isn’t any way to recover tract-level counts of Asian Indians for counts < 100.

You could develop a model to estimate Asian Indians at the tract level using other data (e.g., city-level counts of Asian Indians and tract-level counts of Asians, or other combinations of data). That’s far beyond my expertise, and there will definitely be uncertainty in your estimation.

The nice thing about the tract-level data you do have is a sense of certainty - there are 100 Asian Indians in the tract data from 2010 SF2. Thus, if you say that census tract X has residents identifying as Asian Indian, you can be certain that’s the case.

Dave Van Riper