How to weight data by cell from a complex survey design?


Do you have any suggestions on how to weight data which was collected through a complex survey design (Mexico 2010) which I want to collapse by municipality*cohort cells. Usually I would weight those cells by population but not sure how to account for the complex design weights on top of that. Any suggestions or references very much appreciated!



The Mexico 2010 data include person and household level weights that should be used to generate representative estimates. These weights were created by the Instituto Nacional de Estadística, Geografía e Informática (INEGI).

I hope this helps!



Thanks Joe. Yes, I’m aware that I need to include these weights. I was wondering, however, how to combine different types of weights if I wanted to collapse the data by municipio*year of birth cell to perform analysis. Any thoughts would be useful!



By “combine different types of weights” do you mean combine household and person weights? If this is the case, you can simply use the person level weight, since you are ultimately interested in person-level results and people nest within households. Am I understanding your question correctly?



Hi Joe,

I guess I meant how to combine the sampling weights from the survey with analytical weights, in case I wanted to collapse the data by municipality to run regressions (rather than running them at the individual level).




Ah. I think I get it now. If your outcome/dependent variable is a person-level variable from the microdata and your collapsed data is simply being used as a predictor you would still use the survey weights provided with the dataset. If, however, your outcome variable is also a municipality*cohort collapsed estimate I believe the procedure would be to generate analytic weights when collapsing the data. For example, if I collapsed the data to generate mean incomes by municipality*cohort groups I would generate those means using the provided individual-level survey weights. The collapsed data would then be a municipality*cohort level dataset with mean incomes and weighted populations for each municipality*cohort group. The weighted populations would then be used as analytic weights in municipality*cohort level regressions. This doesn’t account for the standard errors of the collapsed estimates, however. In your regression you could bootstrap your errors as a way of addressing the fact that your predictors are estimates. Like so many things, it seems that how to handle standard errors varies by discipline and type of regression/analysis so I would recommend searching through related publications in your field to see if there is a standard you can follow.

I hope this helps.