# Odd "peaks" showing up for LITBRIG variable mean plots

Hi! When I create country plots to show how the mean of literacy for the variable LITBRIG has changed over time for each country, I notice that India, Bangladesh, and Zimbabwe have odd peaks, suggesting that literacy skyrocketed during one of the survey years and then went down drastically the following survey year. Is there an explanation for this? Thank you in advance!

In order to calculate the literacy rate using LITBRIG, you will need to code respondents to a new dichotomous variable that reports whether the respondent was literate or not. This will require dividing your sample into three groups: respondents who are literate (LITBRIG = 10, 11 and/or 12 depending on the threshold for literacy that you impose) should be assigned a value of one, respondents who are illiterate (LITBRIG = 20 and/or 12) should be assigned a value of 0, and respondents whose level was not ascertained, missing data, and not-in-universe NIU observations (LITBRIG = 31, 32, 98, and 99) should be assigned as missing and excluded from the measure. While it is unclear whether a respondent who was NIU for LITBRIG was literate or not (which is why NIU data is typically set to missing), you might consider coding NIU responses as literate if you have reason to believe that to be the case.

Using the recoding I have suggested above and PERWEIGHT to produce estimates that are representative of the country and survey universe, I estimate literacy rates for Bangladesh, India, and Zimbabwe, as well as Rwanda (see screenshot below). You can find sample R code in the IPUMS data training exercises to help you run a similar analysis.

You should also note that LITBRIG is generally only available for ever married women aged 15-49. This means that your measure would only be measuring the literacy rate for this group. For analyses of literacy rates for the entire population of a particular country, you can use the variable LIT on IPUMS International.

Hello Ivan,

Thank you so much for your help! I was surprised how thorough the response was, so thank you for taking the time to respond. I really appreciate it.

I have followed your instructions in recoding. In my case, I am interested in illiteracy rates rather than literacy, so my recode for the LITBRIG variable was done as follows:

A value of ‘0’ was assigned to “yes, reads” and “reads easily/whole sentence.” A value of ‘1’ was assigned to “read with difficulty/part of sentence” and “no, cannot read.” A missing value (.) was assigned to “not ascertained (blind or diff. language),” “no card with required language,” “blind or visually impaired,” “missing,” and “NIU (not in universe).”

I am now trying to utilize the PERWEIGHT in my calculations to hopefully see the peaks for India, Bangladesh, and Zimbabwe disappear. I have scoured the internet and the IPUMS exercises you shared for how to correctly use PERWEIGHT in Stata coding, but I’m still struggling. This is what I’ve coded so far in Stata:

use “/Users/andeegempelerdevore/Desktop/Dissertation Proposal/DATA/datasets/variablesrecode.dta”, clear

svyset [pw = perweight]

svydescribe

svy: mean illiterate

svy, over(sample): mean illiterate

When I do this, my estimates for the aforementioned countries still show peaks. Do you have any suggestions for where I’m going wrong with my coding? I’ve tried a variety of combinations and I just can’t seem to get them to go away.

I apologize for asking such a dumb question. I feel quite stupid not knowing the answer to this. Usually I can find answers to this type of question elsewhere, but for some reason I’m really struggling on this one.