I inquire concerning education data for Mexico as I observe surprising inter-temporal changes. For example:

  • Years of schooling: The values increase sharply from 2000 to 2005, then drop again sharply from 2005 to 2010, and then increase again 2015, especially for kids under 16 years old. Could the 2005 data be inflated upwards, and 2010 downwards?
  • Enrollment: Similar as years of schooling, though less pronounced, especially for kids under 9 years old.
  • Literacy: I observe a wave like pattern: 1990-2000 down, then up in 2005, down in 2010, up in 2015, especially for the age group 7-10. In turn, for 6 year olds a sudden sharp decline in 2005 can be observed.

Do you have guidance what may drive these patterns and, in case applicable, how to mitigate/correct for them? For background, I use the IPUMS wave 1990, 2000, 2005, 2010, and 2015, and apply the provided person weights to obtain the averages. I focus on a subset of municipalities (see below).

Subset of municipalities:
Abasolo, Acala, Acuitzio, Albino Zertuche, Altzayanca, Angangueo, Apatzingan, Apizaco, Aquiles Serdan, Asuncion Cuyotepeji, Atempan, Atlahuilco, Bachiniva, Calvillo, Carmen Tequexquitla, El, Chapulco, Chihuahua, Chinantla, Choix, Ciudad Madero, Colipa, Contla de Juan cuamatzi, Coronado, Coronango, Cosolapa, Cuajinicuilapa, Cuapiaxtla, Cuautlancingo, Cuyoaco, Delicias, Ensenada, Escuinapa, Guasave, Guemez, Guerrero, Heroica Ciudad de Huajuapan de Leon, Hidalgo del Parral, Huamantla, Huandacareo, Huehuetla, Huejutla de Reyes, Hueyapan, Inde, Ixmiquilpan, Jacona, Jala, Jalpa, Julimes, Landero y Coss, Leonardo Bravo, Llera, Magdalena Tequisistlan, Manlio Fabio Altamirano, Mariano Escobedo, Matamoros, Miahuatlan, Molcaxac, Moris, Namiquipa, Nanacamilpa de Mariano Arista, Nopalucan, Nuevo Morelos, Ocozocoautla de Espinosa, Ojocaliente, Olintla, Oro, El, Otatitlan, Pacula, Pajacuaran, Pajapan, Panuco, Pichucalco, Pisaflores, Platon Sanchez, Querendaro, Rafael Delgado, Rafael Lara Grajales, Rodeo, San Agustin Metzquititlan, San Blas, San Felipe Usila, San Gregorio Atzompa, San Jose Miahuatlan, San Jose de Gracia, San Juan Atenco, San Juan de Guadalupe, San Luis del Cordero, San Miguel Amatitlan, San Pablo Villa de Mitla, San Sebastian Tlacotepec, San dimas, Santa Catarina Tlaltempan, Santa Isabel, Santa Maria Xadani, Santo Tomas Hueyotlipan, Satevo, Sayula de Aleman, Soledad de Doblado, Soto la Marina, Suchiate, Tancitaro, Tantima, Tapachula, Tapilula, Tecali de Herrera, Tecate, Temosachi, Tenampulco, Teolocholco, Tepetzintla, Tepezala, Tinguindin, Tlacotepec de Mejia, Tlacuilotepec, Tlalchapa, Tlaltetela, Tlaola, Tlaxco, Tlaxcoapan, Tocumbo, Tonayan, Tumbala, Tuxpam, Tuxpan, Tuxtla Chico, Tuzamapan de Galeana, Tuzantan, Tzitzio, Uruachi, Venustiano Carranza, Veracruz, Vigas de Ramirez, Las, Xicohtzinco, Xiutetelco, Xoxocotla, Zacapu, Zapotitlan de Mendez, Zozocolco de Hidalgo

I’ve taken a look at the Mexican samples. I am not quite able to replicate your observation. However, there may be at least a couple reasons that explain the fluctuations you are noting. First, be sure to take note of the changing universe statements of the SCHOOL and YRSCHOOL variables. In particular, for both of these variables the universe includes all persons five years old and older in 2000 and 2005, but expands the universe to all persons three years old and older in 2010 and 2015. For consistency over time, you should probably restrict your sample to include only those who are 5 years old and older for all sample years. Second, when limiting your sample by specific municipalities, be sure to use the spatially harmonized variable GEO2_MX. This variable enables more consistent calculations over time by fixing municipalities to be the same geographic areas across census years.

Indeed, I limit the data to kids 5-18 years old and I observe the patterns within certain age groups. Concerning the geographic identifiers, I did not look at the patterns within the municipalities (yet), I aggregate across the indicated municipalities. I used the GEOLEVEL2 indicator so far, but can double check with the other variable.

That being said, could there be another reason for this behavior of the series (see attached pictures).

Thanks for providing these figures. I can certainly see the trend you discuss above. Although I am not aware of any documentation that would explain this sort of trend, I do have several thoughts. First, it is good to know you are using the GEOLEV2 variable. This variable also provides consistent geographic boundaries over time. Therefore, I do not think this trend is due to differential selection of geographic areas over time. Second, looking at these figures, I wonder how large these differences are in reality. For some age groups, these differences seem quite low, but without an estimate of the standard error or variance of these estimates it is difficult to tell for sure. Are you certain that these differences are in fact large enough to overcome the inherent sampling error in these data? I suggest perhaps investigating if these differences are “statistically significant” in your sample of these data.

I hope this is helpful. I wish I had a more definitive answer.