Hello! I have created a dataset of 45 countries from DHS surveys with the purpose of exploring the association between illiteracy to multiple other variables. To do this, I wanted to only look at illiteracy means over time for each country individually. I recoded the variable LITBRIG to do this as follows:
A value of ‘0’ was assigned to “yes, reads” and “reads easily/whole sentence.” A value of ‘1’ was assigned to “read with difficulty/part of sentence” and “no, cannot read.” A missing value (.) was assigned to “not ascertained (blind or diff. language),” “no card with required language,” “blind or visually impaired,” “missing,” and “NIU (not in universe).”
Previously, I had a problem where some significant peaks showed up for a few countries when I tried to calculate each country’s mean illiteracy over time. I think utilizing the PERWEIGHT variable in my Stata code will help solve this problem, but I am having trouble understanding what the correct script is. I have scoured the internet and the IPUMS exercises for how to correctly use PERWEIGHT in Stata coding, but I’m still struggling. This is what I’ve coded so far in Stata:
use “/Users/andeegempelerdevore/Desktop/Dissertation Proposal/DATA/datasets/variablesrecode.dta”, clear
svyset [pw = perweight]
svydescribe
svy: mean illiterate
svy, over(sample): mean illiterate
When I do this, my estimates for the aforementioned countries still show peaks. I have seen a few places online that utilizing the svy command for this may not be necessary and that the command [pw = perweight] can be included instead, but I’m unsure of how this is supposed to look from a coding perspective. Can anyone help me? I’d super appreciate it!