Which variable do I use as the cluster variable for the India surveys?

jlieber · May 12, 2017, 2:19pm

I am analysing the India surveys and want to svyset the data on STATA.

I was wondering which variable I should use as the psu/cluster variable? I have seen ‘cluster’ mentioned for this use but this is not available on the international data.

Many thanks

JeffBloem · May 12, 2017, 3:04pm

Nearly all IPUMS International samples, and all India samples, consider households to be the primary sampling unit (PSU). Therefore, it is usually advised to use the household identifier variable SERIAL for clustering. Note, however, that depending on the type of analysis you are performing, it may be more appropriate to cluster a different levels. For example, if you are aggregating data across regions before performing your analysis, then controlling for inter-region correlation of your outcome variable may be advisable. More details about variance estimation with IPUMS International data are available here. Specific sample characteristics for Indian samples are available here.

jlieber · May 12, 2017, 3:23pm

Thank you for the quick response. Sorry for my confusion but the sampling strategy states that it is a multistage design, with the first round selecting rural villages and urban wards. So should an identifier for these not be the PSU? Or was that not available in the data?

JeffBloem · May 12, 2017, 3:26pm

That is correct, villages and urban wards are not identifiable in the data. The lowest level of geography for India is the region level. Houshold IDs are available because they do not identify geographic location.

jlieber · May 12, 2017, 3:27pm

I see, thank you !

Topic		Replies	Views
Difference between SERIAL and CLUSTER for household-level clustering? USA	2	471	October 1, 2014
Psu and Strata variable GLOBAL HEALTH	1	2462	February 6, 2019
Svyset in Indonesian Cencus 2010 INTERNATIONAL	1	197	April 21, 2023
Can I cluster at strata to obtain clustered standard error?(IPUMS International)	3	196	April 18, 2024
Place of Origin/Birth place of Survey data of India INTERNATIONAL	1	356	November 17, 2020

Which variable do I use as the cluster variable for the India surveys?

Related topics