I want to run a regression at household level for a dataset from the subject. However, I have doubts about correct declaration of survey design in Stata with svyset for the dataset. Is this correct:
To use household weights (HHWT) as pweights.
To use state variable (GEO1_NG) to identify strata.
What to use as PSU since sample description states that PSU is enumerated unit and there is no variable to identify this?
I have noticed that among unharmonized variables there is a variable identifying enumeration area(s). As a result, would this be correct:
svyset ea_code [pw=hhwt]
In addition to that, e.g. for year 2006 and 2010 there is no such variable. I assume thah in such case I should use only hhwt when performing regression analysis?
Thank you in advance.
I first recommend that you read the IPUMS-I User Note on “Sampling Error and Variance Estimation” and refer to the bottom of the Sample Design Summary page. The Nigeria samples are drawn by complex stratification with geographic clustering and household clustering. As a result, you should use the household weight (HHWT) as your pweight, use Enumeration Area as your strata, and cluster by household (SERIAL).
As you note, Enumeration Area is not available in all of the Nigerian samples. Not accounting for geographic clustering in these samples will lead to underestimating standard errors, which means you should use caution in interpreting statistical significance at the margins. You might also consider investigating the effect on your standard errors of using State as your strata in the samples without Enumeration Area provided.
Hope this helps.