I’ve attempted to download data from the 2010 survey data for Brazil. However, I noticed several issues with the data when I open it in Stata.
Sao Paulo and Belo Horizonte are not included in the data, which I intend to use as the focus of my research
The metro area does not match up with the state. For example, it lists the metro area as Natal, which is in the state of Bahia, but it has been matched up with observations in the state of Minas Gerais, which is not true.
Please let me know if you have any further advice. Is it possible that the size of the file was too large? Or is something else occurring?
There must be a problem with the stata file. Perhaps it is too large. The file can be made smaller by using the IPUMS “Select Cases” feature (if all you wish to study is a subset, such as Sao Paulo and Belo Horizonte) or by Customize Sample Size (see “How to reduce sample size” on the extract submission page).
To check that the geography in the sample is correct, I used “Analyze Data Online,” selected the 2010 Brazil sample, then ran a table of statebr (row) by metrobr (col) using person weights. The result is that 5,421,987 people are in Belo Horizonte and all are in the state of Minas Gerais. For Sao Paulo, 19,693,625 are in the metro area and all are in the state of SP.
I would also recommend re-downloading the data file and unzipping the file using a program such as 7-zip in case there was an error in the decompression process. The video tutorials on opening extracts are also helpful for trouble shooting errors in the process reading IPUMS-International data into common statistical software packages.
Thank you both for your answers. As you said, if I analyze it online it does appear correctly. I reduced the sample size substantially by number. Unfortunately it will not allow me to select cases for the variables that I selected. I downloaded 7z but the data still does not match up.
Do you have further advice I may follow? It appears correctly online, but it’s off when I’ve actually downloaded it into Stata.
I have downloaded a few of your extracts and I see the same issues you are seeing. I also created my own extract with the br10a_metro and br10a_state variables and saw the same issues. However, I also extracted the variables METROBR and STATEBR, and those variables do not have any issues. I would recommend recreating your extract using integrated variables and see if the same issues arise.