Odd "peaks" showing up for LITBRIG variable mean plots

Hi! When I create country plots to show how the mean of literacy for the variable LITBRIG has changed over time for each country, I notice that India, Bangladesh, and Zimbabwe have odd peaks, suggesting that literacy skyrocketed during one of the survey years and then went down drastically the following survey year. Is there an explanation for this? Thank you in advance!

LITBRIG is a categorical variable where values correspond to distinct responses rather than a specific literacy level (see the Codes tab for LITBRIG). Calculating the mean of LITBRIG will therefore not produce a meaningful literacy rate measure. For example, a value of LITBRIG = 20 identifies respondents who cannot read. Additionally, LITBRIG = 99 identifies respondents who are NIU (i.e., not-in-universe) for the variable. These are persons who were not asked questions that are required to determine LITBRIG. I suspect that these NIU respondents are causing your literacy rate measure to spike since there are a large number of NIU cases in the three samples you mention. While there are a number of reasons that a respondent might not be asked a particular question, information on who was asked the question(s) that correspond to a variable is summarized in the variable’s universe statement (see the Universe tab for LITBRIG). We recommend that new users review the detailed FAQ page on IPUMS-DHS; question 8 specifically deals with NIU data. Based on my review of this information and original DHS questionnaires, it appears that in India in 1998 and in Bangladesh in 2004, women who had completed grade 6 were not asked about their literacy. Additionally, in Zimbabwe in 1999 women who had ever attended secondary school were also not asked about their literacy.

In order to calculate the literacy rate using LITBRIG, you will need to code respondents to a new dichotomous variable that reports whether the respondent was literate or not. This will require dividing your sample into three groups: respondents who are literate (LITBRIG = 10, 11 and/or 12 depending on the threshold for literacy that you impose) should be assigned a value of one, respondents who are illiterate (LITBRIG = 20 and/or 12) should be assigned a value of 0, and respondents whose level was not ascertained, missing data, and not-in-universe NIU observations (LITBRIG = 31, 32, 98, and 99) should be assigned as missing and excluded from the measure. While it is unclear whether a respondent who was NIU for LITBRIG was literate or not (which is why NIU data is typically set to missing), you might consider coding NIU responses as literate if you have reason to believe that to be the case.

Using the recoding I have suggested above and PERWEIGHT to produce estimates that are representative of the country and survey universe, I estimate literacy rates for Bangladesh, India, and Zimbabwe, as well as Rwanda (see screenshot below). You can find sample R code in the IPUMS data training exercises to help you run a similar analysis.

You should also note that LITBRIG is generally only available for ever married women aged 15-49. This means that your measure would only be measuring the literacy rate for this group. For analyses of literacy rates for the entire population of a particular country, you can use the variable LIT on IPUMS International.