I am currently trying to run a longitudinal study using IPUMS USA data for 1990 (1% unweighted), 2000 (1% unweighted), 2010 (ACS) & 2015 (ACS). For 1990 I am using 1% unweighted sample data. I want to join this to the 1990 shapefile boundaries (provided under geographic tools) in ArcGIS to select the data for the PUMAs in my study area (20 largest US metros - as opposed to all US PUMAs). So I aggregated the IPUMS USA data for each puma (for each year) and created a code to match the GISMATCH code provided in the IPUMS boundary files attribute table. This 6-7 digit code contains 1-2 digit state fips code (no leading zero), and 5 digit PUMA code (with leading zeros). So I created this code and joined each of these files to the boundaries using ArcGIS. For all years except 1990, I was able to join them correctly for all PUMAs. For 1990, however, the 1% boundaries codes are not the same as the 1% data, so only about half of the PUMAs successfully joined. Even more strange, they join 100% to the 1990 5% boundary file. My study areas are metropolitan areas, so I would prefer to use the 1% data and boundaries. Do you have any idea why this join is unsuccessful and how I might solve this problem? I attempted to look through previous answers and the sources online, but have been unable to find anyone else with the same problem.
Is there any way you could provide me with the GISMATCH codes you constructed for the 1990 1% PUMAs? The ones you created from the microdata? I want to compare them to the data in the shapefile and the data I just created from that sample so that I can troubleshoot your problem.
Sure, here are the codes I created from the IPUMS USA data using the state and puma codes provided to create the code here called STPUMA to match the GISMATCH code. It joins to the 1990 5% shapefile GISMATCH codes but only half join correctly to the 1% shapefile. Also, I figured out that my 2010 1% data from ipums usa matched with the 2000 puma boundaries (not the 2010) so I wondered if it’s a similar problem that the data has codes that match to the 5% boundaries because they are the older ones?
ipums90.csv (6.77 KB)
I can quickly answer your question about the 2010 1% ACS sample. The 2010 1% ACS sample used the 2000 PUMA 5% boundaries. The 2012 1% ACS sample was the first ACS microdata sample to use the 2010 PUMA boundaries.
I will look into your 1990 mismatch now.
So, based on my understanding of the sample description here, the 1% unweighted sample is actually a state sample. Thus, the PUMA codes in it will match the codes in the 5% state sample. I believe the 1% unweighted sample is actually drawn from the 5% state sample.
If you want the metro sample, you should select the 1% sample for 1990 and not the 1% unweighted. The 1% sample will have PUMA codes that match the 1990 1% PUMA shapefile.
The 2000 1% sample is the equivalent to the 1% sample in 1990. The 2000 1% unweighted sample comes from the 5% state sample and will have the same PUMA codes as the 5% sample.
Finally, I noticed a problem with our 1990 1% PUMA shapefile. It looks like we forgot to dissolve the shapefile on GISMATCH. Right now, the shapefile contains multiple polygons with the same GISMATCH code (if the PUMA consists of multiple parts). I will get this corrected, but you may want to dissolve it before joining PUMA data to it.