Help with SAS code for merging records based on family interrelationships

#1

I am working on creating a study sample using the South African 2001 Census and 2007 Community Survey data. The main variable of interest we are looking at is the number of children born alive who have died for each mother (CHBORN minus CHSURV). The sample we’d like to create would include all individuals identified as mothers, with "paternal’ (‘paternal’ as in the men who are fathers to her children) and child variables added on to each maternal record.

I found the sample SAS code for merging records based on family interrelationship available in the first End Note of the Family Interrelationships page to be very helpful. However, when I run this code, the new data set has a greater number of observations then the original. Can you help me figure out what is going wrong here?

Thanks,

Stephanie

#2

I was not able to reproduce your problem using an extract I created. Please attach a file containing your syntax in a reply to this question or send a copy of you syntax file to ipums@umn.edu so that I can get a better understanding of what may be causing your problem.

#3

“I was not able to reproduce your problem using an extract I created. Please attach a file containing your syntax in a reply to this question or send a copy of you syntax file to ipums@umn.edu so that I can get a better understanding of what may be causing your problem.”

Thank you for your reply.

I have attached a text file with my syntax in my reply. When I run each set of procedures to attach each spouse characteristic, I get a data set with more records than the previous data set.

Original extract: 4,773,312 records

Merge 1: 4,773,755 records

Merge 2: 4,775,194 records

Merge 3: 4,783,462 records

Merge 4: 5,196,790 records

I have more characteristics I want to attach, but I’ve stopped until I figure out what I am doing wrong here. Thank you for your help.

Merging syntax-1.txt (2.43 KB)
Merging syntax.txt (2.43 KB)

#4

As I have been looking into this further, it appears that the problem is stemming from individuals in polygamous marriages. If you look at the variables age_sp and your merged variable sp_age you will see that about 870 individuals do not have matching values (even though everyone should), this is due to some men having multiple wives, but only one of them can be that man’s SPLOC value.

One way around this may be to create two separate datasets, one containing only women and another containing only men, then merge the men dataset onto the women dataset using the variable sploc from the women dataset and the variable pernum from the men dataset. Since you are interested primarily in women you could then discard the men dataset.
I hope this helps.

#5

Thank you so much for looking into this for me.

I’ve created two separate datasets, one containing only women and one containing only men, but I’m unclear on how exactly to merge the men dataset onto the women data set. Could you explain how to in a little more detail and perhaps attach a sample SAS code?

Again, that you for your assistance with this!

#6

Basically you would use the same merging proceedure, but with the WOMEN dataset AS a and the MEN dataset AS b.

One thing to to differnetly, however, would be to rename the MEN variable pernum to sploc because now you will be merging on the WOMEN variable sploc. You should also remember to rename the MEN age variable to SP_AGE. Finally make sure both datasets are sorted on year, serial, and sploc

The merging procedure should look something like this:

CREATE TABLE IPUMS.merged1

AS SELECT a.*, b.SP_AGE label=“Spouse’s age”

FROM women AS a LEFT JOIN men AS b

ON (a.year=b.year AND a.serial=b.serial AND a.sploc=b.sploc) ;

QUIT ;