I don't know why the estimate of Native Hawaiians from RACE do not match with Cesus tables.


I am trying to estimate the number of Native Hawaiians (NH) including part-NH in Hawaii based on 2010-2014 ACS 5-year sample. I coded NH based on the detailed codes of the RACE variable to include NH and NH who are mixed with other race groups and restricted the sample to Hawaii. My weighted number of NH including part-NH is 400,662, which is way higher than the estimate of 295,409 “Native Hawaiian alone or in any combination” from Census table (http://factfinder.census.gov/bkmk/table/1.0/en/ACS/14_1YR/S0201/0400000US15/popgroup~062).

Here are my codes to create the NH estimate:

gen NH=0
replace NH=1 if raced==630 | raced==821| raced==861 | raced==862 | ///
raced==863| raced==864 | raced==911 | raced==912 | ///
raced==913 | raced==914 | raced==964
tab NH [fw=perwt]

I notice that for categories of “raced” such as “862 Chinese, Filipino, and Native Hawaiian (2000 1%)” have additional information of 2000 1% in the parenthesis. The Comparability of the RACE variable mentions something about 1% and 5% sample (https://usa.ipums.org/usa-action/vari…), but I am not sure what it means to the estimate I want to get. To check if these categories are the reasons for the differece of my estimate and the number from Census tables, I also excluded these race categories with additional information in the parenthesis. That is, I coded as follows instead:

gen NH1=0
replace NH1=1 if raced==630 | raced==821| raced==861 | ///
raced==864 | raced==911 | raced==914
tab NH1 [fw=perwt]

The returned estimate is 342,838, which is still higher than that in the Census tables.

I am not sure what is wrong and would appreciate any help on this!



It seems this discrepancy can be explained by a couple details. First, the table used as a comparison to your analysis of the 5-year 2010-2014 sample uses the 1-year 2014 sample. Second, it is not clear that your tabulation restricts your count to only those who live in Hawaii. This may be driving your weighted count upwards. Third, there are several race codes, from the “four or more race groups” category, that are left out of your code (i.e. 975, 976, 986, 991, 893, 994). It is ultimately up to you if you’d like to include this category in your analysis, but the Factfinder site does seem to include these codes. Accommodating these details should allow you to calculate a number that has an overlapping confidence interval with the published table figures.