Hello,
I’m trying to do a time series comparison of Moderate Income Households in San Jose between 2000 and 2017, using Census 2000 5% sample and ACS 2013-2017 sample.
I’ve used ArcGIS to select PUMA’s for 2000 and 2010 that generally cover the San Jose area, and comprised almost the exact same area. I am using Pernum(1) and hhwt for my cross tabulation. However, I am receiving a result for the Census 2000 sample that is almost 100,000 households higher than it should be according to other census sources for San Jose.
Anyone ever experienced an issue like this before or have an idea as to why this may be happening?
The result you’re getting is probably due to mismatches between the city boundaries of San Jose and the PUMAs you’ve selected. Even if the PUMAs you’ve selected do generally cover the area of San Jose, the area of mismatch between the PUMAs and San Jose could still easily (and in this case, apparently does) include 100,000 or more residents.
The IPUMS USA CITY variable identifies a city whenever the mismatch between the city’s population and the population of associated PUMAs is less than 10%. The CITY variable has not identified San Jose since 1980, so it hasn’t been possible to identify San Jose through PUMAs with better than a 10% mismatch since then. Given that the population of San Jose in 2000 was about 895,000, the mismatch with 2000 PUMAs would have to involve something close to 100,000 people or more in order to surpass 10%.
IPUMS USA supplies crosswalks between large cities and PUMAs that can help you identify exactly how much population mismatch there is between each city and intersecting PUMAs. You can find them at the bottom of the CITY Comparability page or at the bottom of the CITYERR Description.
Hi Jonathan,
Thanks for your speedy reply!
I’m still not sure I quite understand how PUMA’s work in relation to the CITY variable.
Using figures from the CITY crosswalk for 2000, the PUMA’s I’ve selected should add up to about 900,000. However, when I run a crosstabulation using person weight variables, I’m still getting a result of about 1,100,000.
Should I chalk up this difference to sampling/weighting error?
OK, got it. When identifying PUMAs in microdata, it’s important to include both the PUMA code and a state code. Many PUMA codes are not unique across all states, and two of the 2000 PUMAs that correspond to San Jose have codes that are also used in other states. If I limit my microdata selection with STATEFIP = 06 (for California), then I get a population of ~900,000 for the San Jose PUMAs. If I ignore STATEFIP, then I get ~1,100,000.
WOW Thank you! Everything makes sense now, can’t thank you enough!
1 Like