Municipalities in 1991 Brazilian Census

Hello. I need to aggregate microdata from the 1991 Brazilian census to the municipality level. However, I cannot find a municipality variable. Is the unharmonized municipality source variable available for this dataset?

The municipality identification variable for the 1991 Brazilian Census is GEO2_BR1991. This variable can be found under the Geography A-E variables tab.

Good afternoon,

Thank you for your previous help finding the GEO2_BR1991 municipal codes! However, I am trying to merge these codes with other municipal-level datasets from Brazil, which use the conventional 6- or 7-digit municipal codes, rather than the 5 digit GEO2_BR1991 codes that appear in the IPUMS data. Furthermore, the text labels for each municipality include multiple names and/or incorrect names per code, making it difficult to merge or match by label. Is there any way to access the 6 or 7-digit versions of the municipal codes for this census, or cleaner text labels? Alternatively, am I missing another way of cleaning/backing out unique municipality identifiers that would be compatible with other datasets?

IPUMS International does not provide any of the geography source variable anymore for confidentiality reasons, but they can provide the translation table that has all the municipal codes (6-digits, which is 2-digits for state + 4 digits for municipio). You can use the attached translation table to figure out the exact municipios that got merged together to form the combined municipio. e.g. IPUMS code 11003 (Santa Luzia D’Oeste, Pimenta Bueno) is a combination of municipio 110029 (Santa Luzia D’Oeste) and 110018 (Pimenta Bueno).
stripped_geo2_br1991_tt.xlsx (914.5 KB)

Hi Jeff,

Thank you for your prompt attention and explanations!

Hi Jeff,

I am merging the census microdata with another dataset as well, at the municipality level, but I am using more years: 1980, 1991, 2000 and 2010. Can you please provide me with the list for the other years? I got 1991 already from your response to this thread.

Thank you,
Stephanie

If you go to IPUMS International and select the Brazilian samples you are looking at using on the Select Samples page, you will find the municipality variables you are looking for under the Geography A-E variables group. Specifically, it sounds like you will want to use the GEO2_BR1980, GEO2_BR1991, GEO2_BR2000, and GEO2_BR2010 variables.

Hi Jeff, thank you for your response. What I am looking for is the original census municipality codes, so I can merge my dataset with another that uses the official IBGE municipal codes of 2 digits for state + 4 digits for municipio. These variables do not contain that. The table you provided above for Erik is exactly what I need, but it only contains the 1991 municipalities. I would like to get such a translation table for the 1980, 2000 and 2010 census years as well. Thank you.

Ah, sorry. I misunderstood your question. Here are the additional tables corresponding to the 1980, 2000, and 2010 geographic codes.

stripped_geo2_br1980_tt.xlsx (855.7 KB)
stripped_geo2_br2000_tt.xlsx (539.0 KB)
stripped_geo2_br2010_tt.xlsx (648.1 KB)

Thanks so much! In sheet 2 for these files, do you know what those numbers next to the municipality names mean? For example, in 1980 it says Cacoal {3114}, and in 1991, Cacoal {4632} and in 2000, Cacoal {1006}

How did IPUMS get the information to decide whether to bundle municipalities? Do you have a sheet that shows which ones have been bundled with the reasons for why? I imagine it’s because they used to be one municipality at one point and then split into two or more, but maybe there are other reasons as well. I would like to note the reasons in my dissertation.

Also, please correct me if I’m wrong:
New municipalities that popped up, but did not split from an existing municipality, were given a new ID.
Any municipalities that have been renamed over time would have retained the same municipal ID.

Ah, sorry for the confusion. You can safely ignore the information in sheet 2 of these files. That is outdated information leftover from an older version of these variables. IPUMS International “bundles” these municipalities as directed by our partners in the various national statistical agencies around the world. In most cases, the national statistical agency wants to preserve the confidentiality of respondents in their data by “bundling” various low-level geographic areas together.

Finally, regarding the last two questions: Note that these year-specific municipality codes are not comparable over time. Both the names and the geographic boundaries of given municipalities could change between samples. If you are looking to do analysis at the municipality region over time, the GEO2_BR is a geographically harmonized (e.g., consistent geographical boundaries) municipality variable.

Hi Jeff,

I now have all the IPUMS census data for Brazil for the years 1980, 1991, 2000 and 2010. I would like to do analysis at the municipality level over time and, as you suggested above, I was planning to use the GEO2_BR variable to match the municipality “bundles” over time. However, the codes in this variable seem to have been used for different municipalities in different years. For example, in 1980 the code 35299 corresponds to São Paulo city, but in the other years it corresponds to other cities, excluding São Paulo. These other cities are not geographically very close to São Paulo city to be considered the same municipality at some point. I checked various other codes in GEO2_BR and found the same issue. Am I doing something wrong here?

Thank you,
Stephanie

It sounds like you are referring to the un-harmonized (e.g., year-specific) geographic boundary variable GEO2_1980 and the corresponding GEO2_1991, etc. instead of the “consistent boundary” variable GEO2_BR. Since the year-specific geographic boundary variables are not harmonized over time to have consistent boundaries, you will almost certainly find changes in the boundaries associated with specific codes over time.

Hi Jeff,

I think I get it now, thank you. I am using the correct variable, GEO2_BR, but with the translation tables you shared earlier in this thread. These tables have different codes for each census year. What I should use instead to match the IPUMS IDs in GEO2_BR to the municipality names is in this link. So now I can match to the municipality names but not to the census municipality codes directly. Do you have an excel sheet for GEO2_BR that does this matching of IPUMS and census IDs? To avoid errors when matching by name.

Thank you,
Stephanie

This tread was very useful. I was also struggling with an unexpectedly low number of municipalities even in the un-harmonized municipal codes.

@JeffBloem: please add this request: it would be great if the documentation made clear that even un-harmonized variables have been bundled.

I also wonder why these are being bundled. IBGE provides (in the past and currently) the microdata with the original municipal code, so it seems strange that they would ask IPUMs to make this variable more coarse. Could this be a procedure that IPUMs implements unilaterally?

In any case, can I use the un-harmonized household and person ids (e.g.
BR2010A_DWNUM
and BR2010A_PERNUM for the 2010 Census) to match the IPUMs version to the original IBGE census microdata?
I love the Ipums standardization, variable names and constructed vars, but preserving the municipal codes as much as possible is crucial in this case.

regards
Lucas

Thank you for these, Lucas. I am also awaiting IPUMS’ response on both of our messages.

Sorry for the delayed response. Regarding the relationship between the year-specific (e.g., unharmonized) geographic codes, note that you can always include both the year-specific and consistent-boundary codes together in the same data extract. Tabulating these variables will show you exactly how these two geographic identifiers relate to each other. Additionally, the linked spreadsheets shared above on this thread provide a correspondence between the IPUMS year-specific codes and the original geographic codes provided by the Brazilian statistical agency.

I also figured out after reading more about IPUMS procedure is that no geographies with less than 20k people are made public. Since there are lots municipalities have less than that (in the order of 2 to 3 thousand municipalities I believe), that is why the number of municipalities gets reduced a lot.

Tks @JeffBloem for the clarifications on this.

Hello @JeffBloem,

I have a few questions related to this thread.

  1. First, do you confirm that geo2_br1980, geo2_br1991, geo2_br2000, geo2_br2010 present a degree of comparability based on their labels but not based on their codes (for instance, Porto Velho is coded as 1103 in 1980, as 1105 in 1991 and 2010, and as 1106 in 2000)?

  2. If yes in 1), do you confirm that building a variable equal to geo2_br1980 in 1980, geo2_br1991 in 1991, etc, would be completely nonsensical?

  3. Still if yes in 1), can you explain why the variables geo2_br1980, geo2_br1991, geo2_br2000 and geo2_br2010 have not been constructed such that their codes would be somehow more comparable, as in the previous IPUMS versions of these variables? I just want to understand better how the variables have been constructed.

  4. geo2_br2010 is nested in geo2_br but each of geo2_br1980, geo2_br1991 and geo2_br2000 is not always nested in geo2_br. Can you explain why please?
    → Let me take the municipality of Cruzeiro do Sul as an example to explain what I do not understand. geo2_br2010=12004 (“Cruzeiro do Sul”) belongs to geo2_br=76012002 but geo2_br1980=12004 (“Cruzeiro do Sul, Mancio Lima”) belongs to both geo2_br=76012002 (which contains Cruzeiro do Sul in its label) and geo2_br=76012008 (which contains Mancio Lima in its label).
    → I would understand this if Cruzeiro do Sul and Mancio Lima originally formed a single administrative municipality in 1980 which had then splitted. However, this is not the case, as your 1980 excel file posted above shows.
    → One possibility is that Cruzeiro do Sul and Mancio Lima are pooled together in geo2_br1980=12004 because the population size of this unit is very small (around 7,000 inhabitants). However, the IPUMS documentation says it regionalizes small municipalities based on the most recent census (ie the 2010 one, not the 1980 one).
    → My question, in the end in this example, is: why are Cruzeiro do Sul not Mancio Lima not coded/labelled separately in geo2_br1980?

  5. Is there any difference between geo2_br and geolev2?

Sorry for the long post and many thanks for you help.

Best,
Geoffrey

My apologies for the delay in replying. Jeff successfully defended his dissertation this spring, and has wrapped up his time with IPUMS. However, I am able to provide information that should address the questions you raise. Please let me know if you have further follow-up questions.

The core response to your question is that GEO2_BR has been harmonized to have consistent boundaries over time (these variables go beyond nominal integration of matching up municipalities based on name alone and identify areas with a geographically consistent footprint over time). In contrast, the year-specific variables (e.g., GEO2_BR1980) are not harmonized over time as they are only offered for a single year. As such, I do not advise making comparisons among the year-specific variables (this relates to your first three questions).

Regarding your question about nesting, the GEO2 year-specific variables nest within the first geographic level, but not within GEO2_BR. Again, we do not encourage users to make comparisons among the year-specific variables and instead encourage them to use GEO2_BR for any analyses over time.

GEOLEV2 should be the same as GEO2_BR for Brazil; however, GEOLEV2 also includes other countries’ second administrative units. This is useful for researchers interested in making cross-country comparisons, but we provide country-specific variables for researchers who don’t require the coverage of multiple countries.