Table with original and harmonized varnames for Brazilian census

I created and maintain an R package (GitHub - lucasmation/microdadosBrasil: Reads most common Brazilian public microdata (CENSO, PNAD, etc) easy and fast , still unpublished to CRAN) to facilitate access to microdata in Brazil, especially for household surveys.

One of the desired (and partially implemented) features of the package is harmonizing variable names over time and replacing names with more intuitive ones. I was thinking of using the same names that IPUMs uses (whenever concepts make sense), to facilitate future integration with IPUMs data.

How can I get a “variable dictionary”, with all variables as rows, and columns containing the Harmonized names, original names, and other metadata, and procedures for those variables? From the Ipums website, I can consult these variable by variable, but it would be helpfull to have these metadata in tabular form.

A related question: as far as I understand, IPUMs does not cover household surveys outside the US, correct? Is this an area of IPUMs “expansion”?

regards
Lucas Mation

Hi Lucas,

Can you tell me more about this R package? I am using the Brazilian census microdata for my dissertation. I have only used Stata in the past but I am learning R now.

Thank you,
Stephanie

The easiest way to retrieve the available metadata is to create a data extract including all of the Brazilian variables. By default, this will be a huge data file but you can use the “select cases” or “custom sample size” tools to reduce the number of observations (which you don’t really need) down to make the file a manageable size. Then, you can use the ipumsr package to read in the metadata associated with these variables into R. This might not give you all of the information you need, but it will be a good start.

HI Stephanie,

there is a similar package for Stata, with harmonized brazilian microdata, called DataZoom. Please read the description of the package in the link that I sent (you have to scroll the page down)

Dear @JeffBloem,

I assume you mean this:

ddi <- read_ipums_ddi('mydiifile.dii')
metadata <- ipums_var_info(ddi)

Good idea, I will do it and report here in case there are other metadata that I would like to know that are not there.