DHS merging with IPUMS data: pooled sample

Hi, I’m having some errors with merging DHS individual dataset with the IPUMS extract. I’m trying first to merge one country DHS dataset with a non-pooled IPUMS data extract. I created the IDHSPID variable in the DHS dataset first as instructed in the notes. When I attempt to merge the DHS dataset and the extract, I get an error saying that “variable sample is long in master but str5 in using data”. I tried both encoding a new sample variable as well as destringing the DHS dataset but this doesn’t seem to work. If I destring the DHS dataset and then attempt to merge, I get an error that “variable idhspid does not uniquely identify observations in the using data”.

I’m also not sure how I would do this with a pooled extract and the appropriate weights.

Thanks very much for your help!

For linking the IPUMS and DHS program versions of the data, you may prefer to use SAMPLESTR, a string version of the sample variable. Notably, this version of the variable will preserve leading 0s.

Hi Kari,

Thanks for your post. Unfortunately, I don’t think this will work because there is no samplestr in the DHS data only in the IPUMS extract.

Plus, there is an additional issue that when you examine the IPUMS extract, the IDHSPID seems to not be unique - so even when you string the sample variable in the IPUMS extract, there is an error message saying that IDHSPID does not identify unique observations.

Is this something one should force merge? If I do this, I’m not sure I know what I would be looking at.

Thanks for your help!

1 Like

I think the IDHSPID issue arises because you are using children as the unit of analysis. IDHSPID uniquely identifies the respondent across samples. The documentation on this could be much clearer–the CASEID variable description (IDHSPID is a concatenation of SAMPLE and CASEID) makes the respondent piece of this more obvious. To uniquely identify children across samples you must combine IDHSPID with BIDX.