we are working with the Mexican IPUMS data from 1990-2015. We observe surprising patterns in the data, e.g. concerning labor market outcomes of kids, over time.
See for example attached fig. 1, showing the municipality averages for share of kids age 15-18 who are unemployed (based on weighted means; y axis), plotted against the number of observations of 15-18 year olds in the municipalities (only up to 100; x axis), in different years. What strikes us is the apparent pattern (and it’s regularities) of the negative relationship between number of observations and unemployment shares. In your view, what could drive this pattern? Should we be concerned about data quality? Fig 2 repeats fig 1, not limiting by number of observations, and using log number of observations.
Related, we observe that the share of municipalities which yield more than 50 or 100 observations of individuals aged 15-18 varies a lot over the waves. Do you you have a suggestion for how to correct for these differences when running analyses using averages such as average enrollment or employment rates for 15-18 year olds?
Thanks for your help!