Discrepancy Between 1% Sample and Full Count Data

sbeda · November 6, 2018, 4:56pm

Greetings,

I had been working with the 1% samples. I recently downloaded the full count census data and I’m getting significantly different numbers. I’m trying to figure out where I’m going wrong.

For instance, the 1920 full count data from Oregon shows 9,995 cases where the individual was categorized as working in the logging industry (code 306 in the IND1950 variable).

On the other hand, the 1920 1% sample from Oregon with Person Weight (PERWT) applied shows 16,772 cases where the individual was categorized as working in the logging industry (again using IND1950).

This is a person-level analysis so PERWT is the correct weight to apply to the 1% sample in this case, right?

I understand that the full count data and 1% sample will not alighn perfectly. Still, I didn’t expect the numbers to be that different. Like I said, I’m just trying to figure out why the numbers are so off. Am I making a mistake working with the 1% sample or with the full count data?

Thanks for your help!

JeffBloem · November 7, 2018, 4:21pm

The main reason is that the processes for transcribing and coding these two files were very different. For the 1% sample files our historical team is able to give a fair amount of attention to each record, especially those that seemed incorrect. With the full count processing there is just too much data to comb through with such detail. Instead the historical team adapts many of our processing methods, the main adaptation being our partnership with Ancestry.com to produce the transcriptions of the original enumeration forms. This can cause a discrepancy such as what you are reporting here.
That being said, our historical data team is currently working on a new version of the 1940 full count file. One of the updates for this file is improved coding of the occupation and industry variables. So, while I can’t know for sure if this discrepancy will be addressed by this forthcoming update, it may help.

Topic		Replies	Views
Census: 100% vs sample NHGIS	2	51	September 20, 2024
Undisclosed recent revision of IND1950? for 1910 (and 1900-1940) full count Censuses? USA	7	402	April 10, 2024
Constructing an occupation by state time series 1900 - 2010 USA	3	279	July 28, 2023
Downloading Older Versions of Datasets USA	1	416	January 21, 2022
The 1950 Full Count Census USA	3	275	February 27, 2024

Discrepancy Between 1% Sample and Full Count Data

Related topics