Can I cluster at strata to obtain clustered standard error?(IPUMS International)

Can I cluster at strata to obtain the clustered standard error in the study exploring county-level natural experiments’ impact on women’s fertility? According to Bruce Hansen (2022), cluster variables should not have a small number of clusters in the sample, and the number of unit observations should be similar across clusters. I think the strata variable in IPUMS International satisfies this condition and observations are correlated within the strata. Do I make sense?
I look forward to your response!

The best practices for standard error estimation will differ depending on the specific sample(s) you are using from IPUMS International. Most IPUMS International samples were created using a complex sampling design that uses stratification and clustering. You can find information about each IPUMS International sample’s sample design characteristics on this page. Standard error estimation should take into account the sampling design. In most cases, you would set the variable STRATA as the strata and cluster the standard errors (i.e., set as the primary sampling unit) at the level that was the PSU for that particular survey. In some samples, the PSU is the household; in other samples, it is a geographic area. Not all geographic areas that served as PSUs are identified in all IPUMS International samples. You can read more about standard error estimation on this page on sampling error and variance estimation. I would also be happy to provide more targeted guidance on standard error estimation for a particular sample or samples.

1 Like

Thank you for responding to my question.
The sample I use is the 2000 and 2010 data for Brazil. And, observation unit is individual woman. I estimate causal effect of the IPUMS county-level natural experiment on woman’s fertility. In these data, strata could be PSU you mentioned?

Based on the Brazil 2000 and 2010 samples in this table of sample design characteristics for sampling error estimation, you should use STRATA as the strata to account for the complex stratification and cluster at the household-level (using the YEAR and SERIAL to uniquely identify households between the two censuses). Rather than dropping those not in your analytical subpopulation of interest, you will need to analyze your data in a way that retains the sample design information (i.e., you should not just drop cases that are not in your subpopulation of interest). This IPUMS NHIS page (scroll to “Subsetting IPUMS NHIS Data” section) includes sample code for retaining full sample design information while subsetting populations in Stata, R, SAS and SAS-callable SUDAAN. Finally, because the Brazil samples use differential weighting, you will need to specify PERWT as your probability weight.