Census: 100% vs sample

Greetings to the community,

I am curious about why aggregates sometimes differ between census tables based on 100% counts and based on samples. Since the samples are said to be weighted to match the 100% counts in other documentation, I would have expected aggregates to match.

For example:
The total number of households in the 1990 100% STF1 data (source table NP24, NHGIS code E22): 91,993,582.
The total number of households in the 1990 sample STF3 data (source table NP27, NHGIS code EUL): 91,947,410

Obviously this difference of ~45,000 is small relative to the size of the nation, but I intend to use a statistical method where it might matter that they do not match.

Thanks,
Ben

The differences between these two sources are due to the Bureau’s choice to maintain whole numbers for all sample-based estimates (rather than fractional estimates) and to maintain additivity between subgroup counts and totals. There’d be no way to adjust all of the sample-based numbers to match perfectly with the corresponding 100%-count numbers without allowing some of the subgroup counts to be fractions (or to reduce their accuracy by rounding them more severely).

This outcome is also due to the weighting design. To tally the sample-based counts, the Bureau assigns a whole-number weight to every sample response and then sums these whole-number weights to produce the estimates. This also makes it impossible for all the weighted totals to match exactly with all 100%-count totals.

For more information about the Bureau’s approach to sampling and weighting for long-form census data, see the “Technical Documentation” for the corresponding datasets, particularly the sections on the “Accuracy of the Data”.

Makes perfect sense. Thank you, Jonathan.