India DHS district codes

Divya_Pandey · October 20, 2020, 12:57am

Hi,

I need to merge the district codes from India DHS (all rounds if possible) with another dataset which uses state and district codes from the Indian census. I see the variable [DHS_IPUMSI_IA] provides the Census codes at the state level that can be used the merge the two datasets at the state level. But, I need to do this at the district level.

Is there any way I can obtain the codes for the district level that are in line with the Census codes? My guess is that in the original surveys they must have used the Census codes but in order to harmonize the districts the codes were changed.

Any insight would be great. Thanks!
Divya

Swapnil_Singh · November 17, 2020, 3:41pm

I will suggest to use district and state names instead of codes and do a fuzzy matching. This is because in some instances DHS codes are different from census codes.

I do this in the following code for the DHS 2015-16 round and Census 2011. It leads to 95 percent correct matches. After that you can check manually.


rm(list=ls())
gc()
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
library(tidyverse)

#get the dhs file
dhs = foreign::read.dta(file = '../Output/dhs_india_p7_dist_names.dta')  %>% 
     mutate(state_name = tolower(state_name)) %>% 
     mutate(district_name = tolower(district_name)) %>% 
     mutate(state_name =  gsub('&', '', state_name)) %>% 
     mutate(state_name =  gsub('telangana', 'andhra pradesh', state_name)) %>% 
     mutate(district_name  = gsub('&', '', district_name)) %>% 
     mutate(name_dhs = paste(state_name, district_name))
     


#get the india census 2011 file which contains historical population density
dcen = foreign::read.dta(file='../Output/population_density/popdens_with_attributes_1951_2011.dta') %>% 
     mutate(district11 = tolower(district11)) %>% 
     mutate(state11 = tolower(state11))  %>% 
     mutate(district11 = gsub('&', 'and', district11)) %>% 
     mutate(state11 = gsub('&', 'and', state11)) %>%
     mutate(state11 = gsub('tamilnadu', 'tamil nadu', state11)) %>% 
     select(state11, district11, district_code11, popdens11 ) %>% 
     filter(state11 != "na") %>% 
     mutate(name_cen = paste(state11, district11)) 




dmatch = NULL

for  (nr in 1:nrow(dhs)){
     
     dist = matrix(0,nrow=1,ncol=nrow(dcen))
     for (nc in 1:nrow(dcen)){
          
          dist[nc] = stringdist::stringdist(dhs$name_dhs[nr], dcen$name_cen[nc], method = 'jw')
     }
     
     index = order(dist)[1:3]
     
     dmatch = rbind(dmatch, 
                    c(dhs$v024[nr], dhs$sdistri[nr],dhs$name_dhs[nr],
                      dcen$district_code11[index[1]], dcen$name_cen[index[1]], dcen$popdens11[index[1]],
                      dcen$district_code11[index[2]], dcen$name_cen[index[2]], dcen$popdens11[index[2]],
                      dcen$district_code11[index[3]], dcen$name_cen[index[3]], dcen$popdens11[index[3]],
                      dist[index[1]], dist[index[2]], dist[index[3]])) 
     
} #end for nr


dmatch = as.data.frame(dmatch)
colnames(dmatch)= c('v024', 'sdistri', 'name_dhs',
                    'cen11_code1', 'cen11_name1', 'cen11_popdens1',
                    'cen11_code2', 'cen11_name2', 'cen11_popdens2',
                    'cen11_code3', 'cen11_name3', 'cen11_popdens3',
                    'dist1', 'dist2', 'dist3')
foreign::write.dta(dmatch, file ='../Output/matched_dhs_name_census_name.dta')

KariWilliams · November 20, 2020, 10:34pm

The variable DISTRICTIA uniquely identifies districts, though these may not correspond exactly to Census codes in a given year as DISTRICTIA is harmonized to create a consistent spatial footprint over time. You may be able to nominally integrate based on district name (as Swapnil already suggested), but that will require value labels (which are not currently available for this variable). Please follow up directly with ipums@umn.edu to get access to the value labels for DISTRICTIA. The IPUMS DHS team plans to review and apply value labels to DISTRICTIA in a future release.

Topic		Replies	Views
District data of third NFHS (DSH, India 2005)? GLOBAL HEALTH	4	1014	September 17, 2019
Is the district identifier available for 1987 India Employment Survey Data? INTERNATIONAL	1	428	October 30, 2014
India district identifiers INTERNATIONAL	3	630	March 22, 2019
Source Shapefile India INTERNATIONAL	1	236	December 6, 2022
Append IPUMS_Merge to DHS Data GLOBAL HEALTH	2	1291	June 21, 2019

India DHS district codes

Related topics