Psu and Strata variable

#1

I am doing analysis after merging data from multiple waves of 25 countries. My regression specification includes variables that have been calculated at the psu level (employment, etc). My dependent variable is negative of HAZ. My unit of analysis are children below age of 5.

I have some questions regarding the sampling and weighting.

  1. Can i directly use the idhspsu variable to create the requite variables for the surveys where cluster and PSU are not the same?

  2. Do i need to use idhsstrata variable while using svyset command. If yes, how do i deal with missing strata information.

  3. Can i directly use the weight perweight for this analysis?

  4. In one post on this forum i read that in multi country analysis data must be clustered at country level. Do i need to do that for this analysis. If yes, how do i cluster data at two different levels i.e., country level and then individual psu level?

#2

Thanks to Dr. Elizabeth Heger Boyle for these answers.

  1. Can i directly use the idhspsu variable to create the requite variables for the surveys where cluster and PSU are not the same?

Yes.

  1. Do i need to use idhsstrata variable while using svyset command?

Yes. svyset would still perform the weighted estimate if you do not specify the strata, but the standard errors will be wrong.
To weight IPUMS-DHS data in Stata, the command is:

svyset [pw=perweight], psu(idhspsu) strata(idhsstrata)

This establishes the weights in Stata; they are then applied to relevant commands by putting “svy:” at the beginning, such as:

svy: regress y x
svy: mean(y), over(x)

  1. If yes, how do i deal with missing strata information?

The DHS User Forum has information on how to construct strata variables when they are missing. Fundamentally, it depends on the sampling design (which you can find in the appendices to the final reports). If the sample was stratified across urban/rural areas (typical), you can replace the strata variable (idhsstrata) with the urban/rural variable (urban).

  1. Can i directly use the weight perweight for this analysis?
    Yes.

  2. In one post on this forum i read that in multi country analysis data must be clustered at country level. Do i need to do that for this analysis? If yes, how do I cluster data at two different levels (i.e., country level and then individual psu level)?

Whether it’s necessary to cluster at the country level, the cluster level, or both depends on how much of the variation in your dependent variable is explained by these spatial variations. You can calculate this by running a null model, e.g.:

logit depvar [pweight = perweight] || idhspsu:
estat icc

If the rho is large (greater than 0.15 or so), then a mixed or multilevel model is appropriate. I’ve seen people cluster at the country, region, and psu level. These days, the psu level seems to be more common.

If the analysis combines only a few countries, then a dummy variable for each country except one is probably the best approach, and there would be no need to cluster at the country level. To cluster a multiple levels, here are the commands:

regress depvar [pweight = perweight] || idhspsu: || country: