MTONGUE full count data for 1910

Greetings!

For research I am pursuing on the size of the ethnic Latvian population in the United States, I want to make sure my understanding of full-count IPUMS data is correct.

Prior to the 1930 census, Latvian (Lettish) speakers were grouped with Lithuanian speakers, meaning that from published census data we did not have a clear idea of how many persons reported Latvian as their mother tongue (MTONGUE).

However, the IPUMS full count data show a case count of 8,800 persons reporting Latvian as MTONGUE in 1910. As I understand it, IPUMS and Ancestry.com worked together on the full count. If even ball-park accurate, the number would suggest a much smaller Latvian-speaking population than those found in contemporary reports. With what degree of confidence can we say that 8,800 persons reported Latvian as their mother tongue in 1910?

Andris Straumanis

There are two things to note about the MTONGUE variable in the full-count 1910 data through IPUMS USA. First, the 1910 full-count file currently available is preliminary; we make preliminary full-count files available to users with the understanding that we are in the process of improving them. Improvements may include refining the transcription of string variables (which could include mother tongue). Secondly, the universe of persons included in the variable MTONGUE in 1910 is foreign-born persons. Depending on the reports you are using as a point of comparison, the exclusion of U.S.-born persons who speak Latvian/Lettish from MTONGUE may account for some of the difference you are seeing. You may be also interested in the LANGUAGE variable, though in 1910 the corresponding question was aimed at identifying English speakers and only persons who could not speak English were supposed to respond to this variable with a non-English language.

Thank you!

I am also finding some strange numbers for MTONGUE in the 1910 full count data, it might be worthwhile to mention it as the data is assessed by the IPUMS team in its preliminary release stage. I work on Irish-language mother-tongue communities (MTONGUED == 1560). I note that the 1910 1% sample gives an unweighted case-count total of 3,195 persons countrywide for “Irish” mother tongue. Near as I can tell, the 1910 100% data gives a countrywide total of 1,761 cases in the same category – which would mean that the sample turned up more cases in raw count terms than the full data.

In the original census returns, the mother tongue was sometimes written above the word “English,” presumably because there were times when an enumerator mistakenly took the question to be asking about current language rather than mother tongue for foreign born. But even that seems unlikely to be cause of having number of cases in the full count be off by so much.

Thanks for your email and your patience while we look into the issue you described. This is an error on our part and we appreciate your help in uncovering it.

The core issue is that the MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file; the current version of the complete count data erroneously reports information derived from the LANGUAGE variable in the MTONGUE variable. We will be removing MTONGUE from the 1910 complete count data. I will provide more details below; please follow up with any questions. Additionally, we would like to send you a mug as a token of our appreciation for helping us uncover this error. Please email ipums@umn.edu with a mailing address.

A field to collect mother tongue/native language information was not included in the original 1910 census enumeration form and was added by an amendment that required enumerators to ask about mother tongue. I believe that the enumerator instructions on IPUMS USA reflect the updated instructions for recording this based on the amendment; though it is worth noting that the original enumerator instructions (see pages 30-31) did not include guidance on collecting mother tongue/native language. Based on our transcription of the 1910 sample , we expect the mother tongue responses were entered immediately following birthplace in column 12 of the enumeration form (as well father’s mother tongue next to father’s birthplace in column 13; and mother’s mother tongue next to mother’s birthplace in column 14). Column 17 of the enumeration form is where we derive the information for LANGUAGE and SPEAKENG. Note that this question is asked of all persons aged 10 and older and, in 1910, prioritizes English as the response, even if the person did not speak English at home or has a different native language. MTONGUE is not derived from this field given the priority the question places on the English language over other information. In the sample data, we are not deriving MTONGUE from this; we are using the language response entered along with birthplace (with a few exceptions where we impute missing/illegible values). The LANGUAGE and MTONGUE variables use the same coding scheme; we wrote the LANGUAGE values to both LANGUAGE and MTONGUE in the 1910 complete count data. I suspect it is this error, paired with the prioritization of English in the LANGUAGE variable (and I would guess higher rates of bilingual English and Irish speakers) led to the incredibly low frequencies you observed in the complete count data. We are looking into the nuances of the change to MTONGUE enumerator instructions in 1910 to ensure our documentation of this variable in the sample is correct; however, I want to reiterate that the 1910 sample data were entered separately and are not affected by the omission of the MTONGUE values in the complete count data.

Thanks again for bringing this to our attention.

Thanks so much, glad to help! I’m already a proud owner of an IPUMS mug, so I’ll just say that being relieved that the 1910 sample is accurate for this variable is reward enough. Cheers for all the good work IPUMS does!

1 Like