I am trying to help a student who is trying to work with the 2009 Kenya Census data. Both of us have downloaded the data from IPUMS, as well as the report produced by Kenya’s Census bureau. The bureau’s PDF shows total population of 38,610,097 , of which, 19,192,458 were males while 19,417,639 wee females.
When I weight the data with PERWT, my counts are slightly lower: Total = 38,419,350, of which, male = 19,046,170 and female = 19,373,180.
I am weighting the data in R (code shown below). What am I doing wrong?
ids = 1,
weights = PERWT,
nest = TRUE
) → ky09
Thanks for any insights anyone has into my undercounts.
Your estimates of the total population are accurate given the data provided. Since the IPUMS sample of the 2009 Kenya Census is a random 10% subsample of the official data (you can find the details on the sample characteristics page), some estimates are expected to be slightly off from official statistics. The discrepancy does however appear to be greater than the margin of error for this data would suggest. I am reaching out to the IPUMS International team to see if there is any more information I can provide on how this 10% subsample was selected and will follow up here as soon as I hear back.
I want to provide an update to let you know that the IPUMS International team took a look at the data, and did not identify any systematic omissions or other explanations that would account for the discrepancy you are seeing. One thing to keep in mind though is that the Kenya 2009 IPUMS sample is a systematic sample of every tenth household (rather than person) from the full census. Sampling at the household level may create additional minor discrepancies when looking at population estimates.
Thank you so much @Ivan_Strahof … I appreciate both your and the IPUMS International team’s diligence in pursuing an answer to me question. I will pass this information on to the students.
Have a wonderful weekend.