1910/1940 partent's mother tongue

Hi everyone, I am merging the 1910-1940 population data for the whole US, but I found that the two variables of parents’ mother tongues are mostly missing.
Is there any link protocol to find someone about one’s parents’ mother tongue?


Why is this part of the data so problematic in terms of quality, since everyone’s mother tongue is known, and is it feasible to recover their parents’ mother tongue by linking parents through family-level data?

Direct questions about the mother tongue of parents (IPUMS variables FMTONGUE & MMTONGUE) were only included in the 1910 and 1920 census forms; note that these were only asked of people with foreign-born parents. These direct questions are not included in 1930 and 1940.

A person’s mother tongue (MTONGUE) is asked about in 1910-1940. You could leverage the family interrelationship variables (MOMLOC & POPLOC in tandem with the attach characteristics feature) to indirectly capture the mother tongue of parents (create your own FMTONGUE and MMTONGUE) if the parent lived in the same household. However, this approach will miss parents’ mother tongue for persons who do not live with their parents. Because there may be systematic differences between those who do and do not have coresident parents, I would be cautious about using this indirect measure.

Thank you for your answers.

I have two questions.

  1. According to your system, the question of mother tongue was asked in 1910 for foreign born people, but in fact when I downloaded the data, all the data for mother tongue had null values. Can you take the time to check if this part of the data exists?

  2. Regarding the 1940 data, is mother tongue_1940 for foreign born, sample-lines or everyone? The descriptions on the variables was confusing.

Thanks for the additional questions and my apologies for the slow reply.

The 1910 mother tongue information is only included in the sample files–not the full count files. While this is clear in the availability tab, there isn’t much information about why on the variable page. I will request that the IPUMS USA team augment the documentation to note this discrepancy between availability in the 1910 files. I am sharing the relevant information from the revision history:

MTONGUE and QMTONGUE were removed from the 1910 full count sample due to a data transcription error. MTONGUE information from the original Census forms was not transcribed into our digital version of the complete count 1910 data file. The derivation of this variable led to incorrectly high rates of English as a MTONGUE value. Only the 1910 full count file was affected by this error.

Regarding 1940, the universe is sample-line persons; you can identify these with the variable SLREC. Sample-line persons were selected to answer additional questions, including MTONGUE. There are no universe restrictions about whether the person is foreign- or native-born, only whether or not they were a sample line person. For analyses of MTONGUE in 1940, you should use SLWT to get appropriate estimates; however, this variable should be representative of the entire population.

Sorry for my late reply. Is it possible to recover mother tongue 1910? This is very crucial for my research.

Just want to confirm that do IPUMS variables (IPUMS variables FMTONGUE & MMTONGUE) were also included in the 1910 and 1920 census forms; not only 1910 1pct and 1920 1pct.

The same enumeration forms are used regardless of the data file (e.g., 1% or full count). However, the samples and full count files are often derived from different transcriptions of the data. Unfortunately, we only have transcriptions for MTONGUE, FMTONGUE, and MMTONGUE in the samples, and not in the full count data. I am not aware of any plans or available funding to transcribe these variables for the full count files where they are missing due to transcription error, though I have made a note that this would be helpful to include in future transcription work if possible.

(Historical Language Questions)
This link shows that questions about the language were asked in 1940, but why is mother tongue_1940 empty in the IPUMS data?


(For persons of all ages; asked under the category of “Mother Tongue [or Native Language]”)

Language spoken at home in earliest childhood.

I am not able to replicate the issue you are describing. I see MTONGUE values included in both the 1940 1% and full count files. Please note that MTONGUE was only asked of sample-line persons in 1940; you can identify sample-line records using SLREC and should weight analyses of sample-line persons using SLWT.