Merge IPUMS USA 1870-1930 and NHGIS County Level

Hi everyone,
I am trying to merge county level variables coming from IPUMS USA 1% 1870-1930 samples with the corresponding shapefiles coming from IPUMS NHGIS (2000 delineation).

The idea would be to keep adding variables coming from the Census of Agriculture directly from NHGIS or Haines (2010).

I read in the documentation that the variable countynhg in IPUMS USA, after a small adjustment (Adding a G) corresponds to GISJOIN in IPUMS NHGIS.

I removed the counties for which the NHGIS is 9999999 as documented and I did the adjustment, but I noticed that I need to do a m:1 merge between IPUMS USA and IPUMS NHGIS for the merge to go through on GISJOIN/countynhg and year. This confuses me as countynhg should be an identifier from my understanding. Is this the way to go or is there a better way?

Thank you for your time,

Best,

Daniele

Dear Danielle,

Based on your question, I think you are trying to merge summary data from Haines or NHGIS onto the microdata records from 1870-1930 (or at least that’s my interpretation of your question).

In order to support that, you need to create the GISJOIN variable (which you have done). Then, you would need to use a m:1 merge on year and GISJOIN to join the Haines/NHGIS data to the microdata. There are many person records in a given year/GISJOIN combination, and only one county-level record in the summary data. I believe you need to use m:1 to execute such a join.

Please let me know if I’ve misunderstood your question!

Sincerely,
Dave Van Riper
IPUMS Research Scientist

Dear Dave,

thank you a lot for your reply.

I have stateicp-countyicp-year averages which I would like to merge with the NHGIS dataset through the column countynhg (so I got rid of the person records). What I find though is that the same countynhg is assigned to different countyicp-stateicp combinations in the same year for some instances, as shown in the table below. May I ask you some help on interpreting this?

Best,

Daniele

stateicp countyicp countynhg year
South Dakota 1270 0950000 1860
South Dakota 9999 0950000 1860
South Dakota 0230 0950000 1860
North Dakota 0670 0950000 1860
North Dakota 0550 0951755 1870
South Dakota 1190 0951755 1870
South Dakota 9999 0951755 1870
Georgia 0810 1300930 1900
Georgia 0930 1300930 1900
Oklahoma 0010 1780000 1860
Oklahoma 0030 1780000 1860
Oklahoma 0070 1780000 1860
Oklahoma 0050 1780000 1860
Oklahoma 9170 1789175 1900
Oklahoma 9160 1789175 1900
Oklahoma 9140 1789175 1900
Oklahoma 9130 1789175 1900
Oklahoma 9180 1789175 1900
Colorado 1230 2050055 1860
Colorado 0130 2050055 1860
Colorado 0050 2050055 1860
Oregon 0650 4100650 1860
Oregon 0610 4100650 1860
South Dakota 9070 4609115 1900
South Dakota 0410 4609115 1900
Virginia 0130 5100035 1900
Virginia 5100 5100035 1900
Virginia 0550 5100550 1910
Virginia 6500 5100550 1910
Virginia 0550 5100550 1920
Virginia 6500 5100550 1920

Dear Daniele,

Thanks for providing some more information - I know better understand what kind of links you are trying to make.

The stateicp-countyicp codes were created by the IPUMS USA team, building off the ICPSR county codes developed by Michael Haines. Those codes try to maintain a consistent stateicp-countyicp code over time.

NHGIS’ county boundaries (and GISJOIN codes) were developed independently from IPUMS USA, and we created boundaries and codes that matched the data printed in published volumes. Some (many) of the discrepancies you’re observing are the result of the independent development and the NHGIS focus on matching print records.

For example, in Georgia, you identified the following mismatches:

Georgia 0810 1300930 1900
Georgia 0930 1300930 1900

In 1900, NHGIS and the print data has one record for Dooly county (1300930). But, IPUMS USA identified by Dooly and Crisp counties. I think the USA team was able to isolate the records in Crisp in the microdata, so they were able to add its geographic identifier to the microdata.

If you’re looking to merge county-level data, I would strongly recommend creating your averages by countynhg and not stateicp-countyicp.

Yours,
Dave

Dear Dave,
that is perfect, thank you very much for your help!

Best,

Daniele