DHS merging with IPUMS data: pooled sample

specialkk · August 8, 2022, 5:35pm

Hi, I’m having some errors with merging DHS individual dataset with the IPUMS extract. I’m trying first to merge one country DHS dataset with a non-pooled IPUMS data extract. I created the IDHSPID variable in the DHS dataset first as instructed in the notes. When I attempt to merge the DHS dataset and the extract, I get an error saying that “variable sample is long in master but str5 in using data”. I tried both encoding a new sample variable as well as destringing the DHS dataset but this doesn’t seem to work. If I destring the DHS dataset and then attempt to merge, I get an error that “variable idhspid does not uniquely identify observations in the using data”.

I’m also not sure how I would do this with a pooled extract and the appropriate weights.

Thanks very much for your help!

KariWilliams · August 9, 2022, 1:32pm

For linking the IPUMS and DHS program versions of the data, you may prefer to use SAMPLESTR, a string version of the sample variable. Notably, this version of the variable will preserve leading 0s.

specialkk · August 9, 2022, 2:21pm

Hi Kari,

Thanks for your post. Unfortunately, I don’t think this will work because there is no samplestr in the DHS data only in the IPUMS extract.

Plus, there is an additional issue that when you examine the IPUMS extract, the IDHSPID seems to not be unique - so even when you string the sample variable in the IPUMS extract, there is an error message saying that IDHSPID does not identify unique observations.

Is this something one should force merge? If I do this, I’m not sure I know what I would be looking at.

Thanks for your help!

KariWilliams · August 10, 2022, 3:28pm

I think the IDHSPID issue arises because you are using children as the unit of analysis. IDHSPID uniquely identifies the respondent across samples. The documentation on this could be much clearer–the CASEID variable description (IDHSPID is a concatenation of SAMPLE and CASEID) makes the respondent piece of this more obvious. To uniquely identify children across samples you must combine IDHSPID with BIDX.

Topic		Replies	Views
Combining DHS data with IPUMS data GLOBAL HEALTH	2	475	February 8, 2023
Linking HIV Test Results to Individual Data GLOBAL HEALTH	4	1570	June 6, 2019
Append IPUMS_Merge to DHS Data GLOBAL HEALTH	2	1289	June 21, 2019
how do I have a single extract for 30 countries for spousal abuse and child health outcomes?	2	726	August 24, 2018
Question about svyset and IPUMS GLOBAL HEALTH	1	324	September 9, 2022

DHS merging with IPUMS data: pooled sample

Related topics