I’m working on a piece of research for my MSc and I was wondering how I should cluster for my DiD analysis in STATA without using the svyset command.
I get that the smallest level of clustering can be achieved using PSU, however reading the User Notes it says that this variable should be used in conjunction with Strata.
I opted for using the option “vce (cluster psu)” but wondered if it is the right way to cluster.
Thanks in advance!
Thanks for your question. I am assuming you are using the difference-in-differences (DID) command, which is why you are not using the svyset command. Questions about specific analytical applications are beyond the scope of the IPUMS User Support team, but I can point you towards information that may be useful.
I quickly reviewed the difference-in-differences (DID) intro in the Stata Manual, and it seems that only one variable can be specified in the “vce (cluster psu)” option. You are correct that we recommend using STRATA in conjunction with PSU to correctly account for stratification and clustering when computing variance estimates with IPUMS NHIS data. This user note provides more information about variance estimation for IPUMS NHIS. With that said, errors in DID models are typically clustered on the treatment level rather than on the sampling level. Our recommendation to use STRATA and PSU is intended to account for sampling bias rather than treatment bias. This blog post provides a more detailed discussion on clustering for sampling bias versus treatment bias and how each requires a different approach. You will need to determine what level of clustering is appropriate for your model as the recommendation we provide with STRATA and PSU may not be the best fit.
If you determine that it is best to use STRATA and PSU, this statalist forum post outlines how to complete a DID analysis in Stata using the svyset command. It does not use the new DID command, but it allows you to use both STRATA and PSU in your analysis. If you chose to use this alternative method, the proper svyset command would be as follows:
svyset psu [pweight=perweight], strata(strata)
Hopefully this gives you some resources to get started. I encourage you to consult with Stata or other experts in your field to determine what is most appropriate for your analysis. You might also try both the DID command (perhaps just with PSU and even trying a concatenation of PSU and STRATA) and results using the svyset version and comparing your results.