How to take into account the complex survey design (primary sampling unit) of the Brazilian Censuses?

Geoffrey_T · December 11, 2014, 3:28pm

Hello, I have read that the Brazilian censuses had a complex survey design with muncipios being the primary sampling units.

I would like to take this into account in order to obtain accurate standard errors.

My question is the following: can I use GEO2B_BR (municipios with inconsistent boundaries over time) to identify the primary sampling unit given that this variable groups municipios with fewer than 20,000 inhabitants into a single category for each state?

Thank you for your help!

Tim_Moreland · December 11, 2014, 5:59pm

Brazil in 1991, 2000, and 2010 used a complex stratification design (you can find details about each design here), while earlier years used systematic sampling. Since both of these are forms of stratification, rather than simple random samples, you technically should adjust your sample error estimates to account for the sample design. On the other hand, the sample size of the Brazil census is quite large, which means the risk of drawing invalid inferences from not adjusting the standard errors is minimal.

If you are looking at smaller subgroups or relationships that are marginally statistically significant, then it may be necessary to adjust your standard errors. While municipalities were the smallest geography in the Brazil census, households were actually the sampling unit. Thus, you should adjust for the clustering of persons within households by using SERIAL (Household ID) as the cluster variable. The additional use of household or person weights should account for any oversampling of geographies due to the complex stratification design.

This User Note on sampling error and variance estimation provides more information on accounting for sample design, including strategies at the end of the note.

Hope this helps.

Geoffrey_T · December 12, 2014, 1:46pm

Thank you Tim Moreland, this is really helpful.

In my case, I am interested, for each municipio, in the mean income of individuals according to their race.

Can you please tell me if I undesrtood well your reply and the documentation you advised me to read?

a) If I am solely interested in computing these means (ie. in point estimates) without making any statistical inference about them, then I just have to take care of weights and I don’t need to bother with clusters, stratification, or special subpopulation estimation (because these latters only affect the standard errors of the point estimates).

b) If I am interested in statistical inference (eg. testing if the mean income of black people is statistically different from the one of white people in a given municipio A) then taking care of cluster and special subpopulation estimation become very important because I could otherwise obtain mistakenly some significant results (type I error). There’s no stratification variable in IPUMS but this is less of a concern because not adjusting for it yields conservative standard errors (so the worse thing that could happen is a type II error). To summarize, for case b), I should write in Stata:

svyset serial [pweight=wtper], vce(linearized)

svy, subpop(muncipioA): mean inctot, over(race)

Have a nice day, thanks a lot.

Tim_Moreland · December 12, 2014, 8:16pm

In regards to Part A of your question, that is correct. If you are only interested in the mean, then just using weights will be sufficient.

As for Part B, that is also correct. Clustering decreases the precision of the sample estimates, so you certainly want to account for this (Type I error). It is not currently possible to account for stratification. But, stratification acts to increase the precision of the sample estimates and thus is less of a concern (Type II error).

Finally, your STATA code also looks correct.

Geoffrey_T · December 15, 2014, 5:40pm

Great! Thanks a lot for your time and help.

Topic		Replies	Views
Brazil 1970: Municipality variable missing INTERNATIONAL	7	887	February 17, 2020
Why is there only 1,524 unique municipal values in the 2010 Brazil census data when there should be 5,560? INTERNATIONAL	3	319	September 22, 2016
How do I fix an error in downloading the Brazil 2010 demographic survey? INTERNATIONAL	4	334	April 7, 2014
How to identify adjacent municipalities in Brazil? INTERNATIONAL	4	387	February 18, 2016
Geographical comparability of the quality of data in the Brazilian censuses? INTERNATIONAL	1	308	June 22, 2015

How to take into account the complex survey design (primary sampling unit) of the Brazilian Censuses?

Related Topics