Hi everyone,
I am trying to merge county level variables coming from IPUMS USA 1% 1870-1930 samples with the corresponding shapefiles coming from IPUMS NHGIS (2000 delineation).
The idea would be to keep adding variables coming from the Census of Agriculture directly from NHGIS or Haines (2010).
I read in the documentation that the variable countynhg in IPUMS USA, after a small adjustment (Adding a G) corresponds to GISJOIN in IPUMS NHGIS.
I removed the counties for which the NHGIS is 9999999 as documented and I did the adjustment, but I noticed that I need to do a m:1 merge between IPUMS USA and IPUMS NHGIS for the merge to go through on GISJOIN/countynhg and year. This confuses me as countynhg should be an identifier from my understanding. Is this the way to go or is there a better way?
Thank you for your time,
Best,
Daniele
Dear Danielle,
Based on your question, I think you are trying to merge summary data from Haines or NHGIS onto the microdata records from 1870-1930 (or at least that’s my interpretation of your question).
In order to support that, you need to create the GISJOIN variable (which you have done). Then, you would need to use a m:1 merge on year and GISJOIN to join the Haines/NHGIS data to the microdata. There are many person records in a given year/GISJOIN combination, and only one county-level record in the summary data. I believe you need to use m:1 to execute such a join.
Please let me know if I’ve misunderstood your question!
Sincerely,
Dave Van Riper
IPUMS Research Scientist
Dear Dave,
thank you a lot for your reply.
I have stateicp-countyicp-year averages which I would like to merge with the NHGIS dataset through the column countynhg (so I got rid of the person records). What I find though is that the same countynhg is assigned to different countyicp-stateicp combinations in the same year for some instances, as shown in the table below. May I ask you some help on interpreting this?
Best,
Daniele
stateicp |
countyicp |
countynhg |
year |
South Dakota |
1270 |
0950000 |
1860 |
South Dakota |
9999 |
0950000 |
1860 |
South Dakota |
0230 |
0950000 |
1860 |
North Dakota |
0670 |
0950000 |
1860 |
North Dakota |
0550 |
0951755 |
1870 |
South Dakota |
1190 |
0951755 |
1870 |
South Dakota |
9999 |
0951755 |
1870 |
Georgia |
0810 |
1300930 |
1900 |
Georgia |
0930 |
1300930 |
1900 |
Oklahoma |
0010 |
1780000 |
1860 |
Oklahoma |
0030 |
1780000 |
1860 |
Oklahoma |
0070 |
1780000 |
1860 |
Oklahoma |
0050 |
1780000 |
1860 |
Oklahoma |
9170 |
1789175 |
1900 |
Oklahoma |
9160 |
1789175 |
1900 |
Oklahoma |
9140 |
1789175 |
1900 |
Oklahoma |
9130 |
1789175 |
1900 |
Oklahoma |
9180 |
1789175 |
1900 |
Colorado |
1230 |
2050055 |
1860 |
Colorado |
0130 |
2050055 |
1860 |
Colorado |
0050 |
2050055 |
1860 |
Oregon |
0650 |
4100650 |
1860 |
Oregon |
0610 |
4100650 |
1860 |
South Dakota |
9070 |
4609115 |
1900 |
South Dakota |
0410 |
4609115 |
1900 |
Virginia |
0130 |
5100035 |
1900 |
Virginia |
5100 |
5100035 |
1900 |
Virginia |
0550 |
5100550 |
1910 |
Virginia |
6500 |
5100550 |
1910 |
Virginia |
0550 |
5100550 |
1920 |
Virginia |
6500 |
5100550 |
1920 |
Dear Daniele,
Thanks for providing some more information - I know better understand what kind of links you are trying to make.
The stateicp-countyicp codes were created by the IPUMS USA team, building off the ICPSR county codes developed by Michael Haines. Those codes try to maintain a consistent stateicp-countyicp code over time.
NHGIS’ county boundaries (and GISJOIN codes) were developed independently from IPUMS USA, and we created boundaries and codes that matched the data printed in published volumes. Some (many) of the discrepancies you’re observing are the result of the independent development and the NHGIS focus on matching print records.
For example, in Georgia, you identified the following mismatches:
Georgia |
0810 |
1300930 |
1900 |
Georgia |
0930 |
1300930 |
1900 |
In 1900, NHGIS and the print data has one record for Dooly county (1300930). But, IPUMS USA identified by Dooly and Crisp counties. I think the USA team was able to isolate the records in Crisp in the microdata, so they were able to add its geographic identifier to the microdata.
If you’re looking to merge county-level data, I would strongly recommend creating your averages by countynhg and not stateicp-countyicp.
Yours,
Dave
Dear Dave,
that is perfect, thank you very much for your help!
Best,
Daniele