Psu and Strata variable

Mayank_Ag · February 3, 2019, 12:53pm

I am doing analysis after merging data from multiple waves of 25 countries. My regression specification includes variables that have been calculated at the psu level (employment, etc). My dependent variable is negative of HAZ. My unit of analysis are children below age of 5.

I have some questions regarding the sampling and weighting.

Can i directly use the idhspsu variable to create the requite variables for the surveys where cluster and PSU are not the same?
Do i need to use idhsstrata variable while using svyset command. If yes, how do i deal with missing strata information.
Can i directly use the weight perweight for this analysis?
In one post on this forum i read that in multi country analysis data must be clustered at country level. Do i need to do that for this analysis. If yes, how do i cluster data at two different levels i.e., country level and then individual psu level?

king2clio · February 6, 2019, 7:32pm

Thanks to Dr. Elizabeth Heger Boyle for these answers.

Can i directly use the idhspsu variable to create the requite variables for the surveys where cluster and PSU are not the same?

Yes.

Do i need to use idhsstrata variable while using svyset command?

Yes. svyset would still perform the weighted estimate if you do not specify the strata, but the standard errors will be wrong.
To weight IPUMS-DHS data in Stata, the command is:

svyset [pw=perweight], psu(idhspsu) strata(idhsstrata)

This establishes the weights in Stata; they are then applied to relevant commands by putting “svy:” at the beginning, such as:

svy: regress y x
svy: mean(y), over(x)

If yes, how do i deal with missing strata information?

The DHS User Forum has information on how to construct strata variables when they are missing. Fundamentally, it depends on the sampling design (which you can find in the appendices to the final reports). If the sample was stratified across urban/rural areas (typical), you can replace the strata variable (idhsstrata) with the urban/rural variable (urban).

Can i directly use the weight perweight for this analysis?
Yes.
In one post on this forum i read that in multi country analysis data must be clustered at country level. Do i need to do that for this analysis? If yes, how do I cluster data at two different levels (i.e., country level and then individual psu level)?

Whether it’s necessary to cluster at the country level, the cluster level, or both depends on how much of the variation in your dependent variable is explained by these spatial variations. You can calculate this by running a null model, e.g.:

logit depvar [pweight = perweight] || idhspsu:
estat icc

If the rho is large (greater than 0.15 or so), then a mixed or multilevel model is appropriate. I’ve seen people cluster at the country, region, and psu level. These days, the psu level seems to be more common.

If the analysis combines only a few countries, then a dummy variable for each country except one is probably the best approach, and there would be no need to cluster at the country level. To cluster a multiple levels, here are the commands:

regress depvar [pweight = perweight] || idhspsu: || country:

Topic		Replies	Views
Question about svyset and IPUMS GLOBAL HEALTH	1	332	September 9, 2022
How to cluster with NHIS data in STATA? HEALTH SURVEYS	1	176	January 5, 2024
When use Stata svyset command w/strata option CPS	1	914	January 6, 2022
How do you deal with svyset's STRATA when using Census decenial data and ACS data together? USA	4	1565	February 24, 2016
Svyset in Indonesian Cencus 2010 INTERNATIONAL	1	178	April 21, 2023

Psu and Strata variable

Related topics