My population estimates differ from the census'

I have been finding discrepancies in population estimates between my calculations using IPUMS data and the US Census’ comparable data downloaded from

I am working on Arab American population estimates. I downloaded 2018 5-year ACS ancestry data from IPUMs for the state of Illinois. I use the fweight = perwt command in stata for population estimates. When I compare my numbers to the census, there are often discrepancies. I am looking at comparable datasets (2018 ACS 5 year).

For example, my population estimates for Arab Americans using IPUMS data were 13,156 fewer than the ancestry data downloaded from In part this is methodological (I am using a broader definition than the census); however, this would mean that my estimates should be larger.

That said, the overall Illinois population derived from the IPUMS data matches the data on the US Census website.

Is the discrepancy with the Arab American count due to small sample sizes?

Any advice would be much appreciated!

Without knowing how you are defining Arab American, I cannot be certain about the source of the discrepancy. I was not able to quickly find a table from that uses the 2018 5-year ACS, but did find a table based on the 2018 1-year ACS which reports the estimate of the number of people reporting Arab ancestry in the state of Illinois to be 96,653 (+/- 10,175) (see total population row of table pasted below).

I made an extract from IPUMS USA using the 2018 5-year ACS for Illinois and included the variables ANCESTR1 and ANCESTR2. I then defined a variable that was as expansive as the Census Bureau’s definition (also noted in the screenshot above); note that the IPUMS codes and Census codes for some of these ancestry values differ slightly. For persons who reported any of these codes for ANCESTR1 or ANCESTR2, I assigned them a yes value to my variable reporting Arab ancestry. The unweighted count was 3,437. When I weight the data (in Stata using either the svyset commands or the fweight command), I get 98,515 persons reporting Arab ancestry, which is within the margin of error for the 1-year estimates.

I suspect that you are either not including all of the codes that Census Bureau is including or not defining Arab ancestry using both ANCESTR1 and ANCESTR2.

Kari, I never thanked you for taking the time to answer this question. The only follow question (and this more for generally navigating population estimates using IPUMS) is that if I compare numbers to the, it’s ok if the estimates don’t match up so long as my estimate is within the margin of error?

Happy to help! You are correct that we do not generally expect to exactly replicate official US Census statistics with public use data, though we expect estimates to be within the margin of error around the official estimates. Official statistics rely on more detailed versions of the data than the public use versions available through IPUMS USA. This Census Bureau FAQ and this blog post from a former colleague provide a bit more information.

Wonderful, thanks!

1 Like