Discrepancies between online tabulations and 1y dataset aggregations

Hello!
I’ve been working with a puma level educational attainment dataset for the ACS 1 year files(version 7), and checking against tabulations found here:
https://data.census.gov/cedsci/table?q=South%20dakota%20education%20age&t=Educational%20Attainment&tid=ACSST1Y2016.S1501&moe=false&tp=false&hidePreview=false

I was hoping someone might know why these would be different – such as a specific versioning change that would lead to different results.

Thank you!

(Example of discrepancies)
(Aggregations using “perwt” weighting variable)
South Dakota 2010, ages 25+ by educational attainment:
less than HS:
tab: 55,367.416
aggregated 1y: 51,503

HS grad
tab: 168,231.764
aggregated 1y: 164,355

some college/ AA
tab: 168,764.143
aggregated 1y: 163,565

college grad
tab: 140,015.677
aggregated 1y: 150,447

Publicly available microdata are a subsample from the full ACS, while the Census website reports the results based on all of the ACS. For small geographies, such as SD, the discrepancies would be more pronounced. The hope is that the two numbers would be insignificantly different from one another (although it is hard to formally test this as the samples are not independent).

1 Like

To follow up on @skolenik 's answer, you can read more about the differences between the ACS PUMS and the full sample here.

ahh ok, I didn’t realize I was working with a subsample. Is this just a measure taken to insure confidentiality?

I believe that’s the reason.