Stata Code for Using PERWEIGHT Variable for DHS Data

Hello! I have created a dataset of 45 countries from DHS surveys with the purpose of exploring the association between illiteracy to multiple other variables. To do this, I wanted to only look at illiteracy means over time for each country individually. I recoded the variable LITBRIG to do this as follows:

A value of ‘0’ was assigned to “yes, reads” and “reads easily/whole sentence.” A value of ‘1’ was assigned to “read with difficulty/part of sentence” and “no, cannot read.” A missing value (.) was assigned to “not ascertained (blind or diff. language),” “no card with required language,” “blind or visually impaired,” “missing,” and “NIU (not in universe).”

Previously, I had a problem where some significant peaks showed up for a few countries when I tried to calculate each country’s mean illiteracy over time. I think utilizing the PERWEIGHT variable in my Stata code will help solve this problem, but I am having trouble understanding what the correct script is. I have scoured the internet and the IPUMS exercises for how to correctly use PERWEIGHT in Stata coding, but I’m still struggling. This is what I’ve coded so far in Stata:

use “/Users/andeegempelerdevore/Desktop/Dissertation Proposal/DATA/datasets/variablesrecode.dta”, clear

svyset [pw = perweight]


svy: mean illiterate

svy, over(sample): mean illiterate

When I do this, my estimates for the aforementioned countries still show peaks. I have seen a few places online that utilizing the svy command for this may not be necessary and that the command [pw = perweight] can be included instead, but I’m unsure of how this is supposed to look from a coding perspective. Can anyone help me? I’d super appreciate it!

This help page from UCLA is a good introduction to different types of weights in Stata. Typing “help weight” in the Stata command will also lead you to a help window on how to apply different weights. IPUMS provides some sample code for using weights. This page from the IPUMS NHIS user guide includes some sample code for using weights, subsetting your analysis (such as by age), and accounting for sample design when using weights. Also, this page from The DHS Program is a good resource that walks you through how to use weights in multiple steps.

Even when using weights correctly, you may see large intertemporal changes in individual countries when calculating certain statistics. This can be due to a variable having a high level of measurement error, a sample size in a particular country being small, or the universe of a variable changing over time. Literacy rates may also change substantially over time, especially if the DHS surveys are several years apart. I would also double check your code to ensure you have recoded the literacy variable correctly. You can check to see that it is recoded correctly by tabulating the variable, looking for whether the possible values of the variable are what you expect and intend (0 and 1).