Large Difference in Households, 2000 Census 5% and ACS 2013-2017

Emery_Reifsnyder · September 23, 2019, 11:53pm

Hello,

I’m trying to do a time series comparison of Moderate Income Households in San Jose between 2000 and 2017, using Census 2000 5% sample and ACS 2013-2017 sample.

I’ve used ArcGIS to select PUMA’s for 2000 and 2010 that generally cover the San Jose area, and comprised almost the exact same area. I am using Pernum(1) and hhwt for my cross tabulation. However, I am receiving a result for the Census 2000 sample that is almost 100,000 households higher than it should be according to other census sources for San Jose.

Anyone ever experienced an issue like this before or have an idea as to why this may be happening?

JonathanSchroeder · September 24, 2019, 3:02pm

The result you’re getting is probably due to mismatches between the city boundaries of San Jose and the PUMAs you’ve selected. Even if the PUMAs you’ve selected do generally cover the area of San Jose, the area of mismatch between the PUMAs and San Jose could still easily (and in this case, apparently does) include 100,000 or more residents.

The IPUMS USA CITY variable identifies a city whenever the mismatch between the city’s population and the population of associated PUMAs is less than 10%. The CITY variable has not identified San Jose since 1980, so it hasn’t been possible to identify San Jose through PUMAs with better than a 10% mismatch since then. Given that the population of San Jose in 2000 was about 895,000, the mismatch with 2000 PUMAs would have to involve something close to 100,000 people or more in order to surpass 10%.

IPUMS USA supplies crosswalks between large cities and PUMAs that can help you identify exactly how much population mismatch there is between each city and intersecting PUMAs. You can find them at the bottom of the CITY Comparability page or at the bottom of the CITYERR Description.

Emery_Reifsnyder · September 24, 2019, 6:34pm

Hi Jonathan,

Thanks for your speedy reply!

I’m still not sure I quite understand how PUMA’s work in relation to the CITY variable.

Using figures from the CITY crosswalk for 2000, the PUMA’s I’ve selected should add up to about 900,000. However, when I run a crosstabulation using person weight variables, I’m still getting a result of about 1,100,000.

Should I chalk up this difference to sampling/weighting error?

JonathanSchroeder · September 24, 2019, 7:31pm

OK, got it. When identifying PUMAs in microdata, it’s important to include both the PUMA code and a state code. Many PUMA codes are not unique across all states, and two of the 2000 PUMAs that correspond to San Jose have codes that are also used in other states. If I limit my microdata selection with STATEFIP = 06 (for California), then I get a population of ~900,000 for the San Jose PUMAs. If I ignore STATEFIP, then I get ~1,100,000.

Emery_Reifsnyder · September 24, 2019, 8:31pm

WOW Thank you! Everything makes sense now, can’t thank you enough!

Topic		Replies	Views
Can I extract the total population of a specific racial/ethnic group within a specific city?	1	262	August 16, 2013
How to identify a city in the 2012 ACS 1-year when the PUMAs cover areas in multiple cities? USA	1	314	June 25, 2014
Household Count Mismatching USA	4	279	March 1, 2022
Data missing for Dallas city, San Jose city, San Diego city? USA	2	389	November 16, 2016
Is there a way to link place FIPS codes to the variable CITY in IPUMS-USA? USA	3	565	October 30, 2015

Large Difference in Households, 2000 Census 5% and ACS 2013-2017

Related topics