I want to track changes in population of Indonesian regencies across census rounds. IPUMS does not get me there...


I have downloaded IPUMS data for Indonesia, and want to track the change in population of the different regencies (kabupaten) across census rounds. However, there are some issues:

  1. The number of cases varies dramatically across census rounds. For example, for regency Magelang, I have 2791 cases (=people?) for the 2005 census round, but 117467 for the 2010 census round. The population can impossibly have changed by that much…what is the problem?

  2. Interestingly, for the 2010 census round, it seems like the true population is always pretty much ten times the cases in IPUMS; actual population data from the 2010 census is made available here (for the Central Java province, which hosts the Magelang regency): http://sp2010.bps.go.id/index.php/sit…. Can that make sense? But even if so, how do I treat the 2005 number of cases, i.e. 2791?

As a final question, is IPUMS even the right data to consider?

I would really appreciate some helpful comments! Thank you.



The IPUMS data is for researchers who, first, study the documentation, then select the microdata needed, and finally download and analyze the data using statistical software. If all you need are population counts then you may wish to look elsewhere.

The first place to study IPUMS doucmentation is the sample descriptions. Clicking the sample description tab on the IPUMS home page, brings up this page for Indonesia: https://international.ipums.org/international/sample_designs/sample_designs_id.shtml .

By scrolling down to the sample fraction, you will see that the 2010 sample fraction is 10%, and the 2005 sample is 0.51%. So to get a rough idea of person record counts in the various samples, 2010 counts should be multiplied by 10 and 2005 by 200. More information is provided by the person weight variable, wtper: https://international.ipums.org/international-action/variables/WTPER#description_section

Because the data are samples, population counts will have sample errors. For small regencies, the confidence intervals may be quite large.

Second, is the variables. The title of the regency variable is:

| geo2b_id | Regency, Indonesia [Level 2; inconsistent boundaries, harmonized by name] | |

If consistent areal boundaries is what you need for the Regency variable (Level 2 for Indonesia), then at present they are not available from IPUMS. Harmonizing geography by area is a challenge. For many countries, including Indonesia, Level 1 geography is harmonized by area ( geo1a_id for Indonesia) . We are working on harmonizing level 2 geography by area. It is a slow process, particularly, as in the case of Indonesia, where the census agencies does not provide the detailed documentation required for doing this.

If you have or find documentation for harmonizing Indonesian regencies by area over time (codes for sub-districts in each census or survey), we would very much appreciate a copy.