What is the procedure for merging census summary data with IPUMS USA?
If you’re looking to spatially join summary data with microdata on IPUMS USA, you will need to match the correct geography codes across your two datasets. American Community Survey data from IPUMS USA provides geographic identifiers for state, county, PUMA, and metro area. Note that not all counties and metro areas are identified in the microdata; please refer to the description and codes tabs for these variables to determine which areas you will be able to join. You may need to create a crosswalk that will match the corresponding codes across your datasets using the name of a location. The list of IPUMS USA identified counties may be a good starting point as it provides the codes that IPUMS uses to identify both states and counties. When joining county or PUMA, you will need to include the state FIPS/ICP code in the variables you merge on since both county and PUMA codes in IPUMS USA are state-dependent. It’s important to also note to keep leading zeros when they appear in geographic identifiers.
I hope these notes are helpful; please feel free to follow up with more details on the data that you’re looking to join and I can provide more specific feedback.
What do you mean by a “crosswalk?” And how do I create one?
A crosswalk is a table that matches equivalent items across different classification systems. For example, IPUMS provides state codes in both the variables STATEFIP and STATEICP. Each of these use a different system to organize states (e.g. Alabama is coded as 1 in STATEFIP, but is equal to 41 in STATEICP). A crosswalk between these two systems would show you what the different codes for the same state are and allow you to translate values from one system to the other. You can generate a variety of crosswalks on the Geocorr website.
I am an older scholar. There’s no one in my department who is available to assist me in merging the census summary data with ACS. Is there someone on the IPUMS staff who can walk me through this process?
I would be happy to follow-up with more specific instructions to help walk you through the process. Please let me know which datasets you’re trying to merge and the geographic level that you’re trying to merge on. IPUMS also provides a variety of data training exercises on our website. We are currently in the process of creating an exercise that specifically tackles creating crosswalks, which will be available in the fall.
Ivan:
Thanks. Are you available Monday May 15th at 2 pm? If so, I can set up a Zoom meeting for you to meet with my grad assistant and me.
I’m happy to provide additional feedback, but I’m unfortunately unable to setup a zoom meeting to create a crosswalk. I noticed however your other forum post here and this made me think that you are trying to merge summary data from census with US microdata on the city level; is this correct?
If so, you’ll notice that the IPUMS USA variable CITY has its own coding scheme which is different from place FIPS codes used by Census for cities. However, we include both place FIPS & IPUMS CITY codes in the "PUMA Match Summary" files available in the CITY comparability section. Summary data uses the FIPS codes, so those files enable linkages. Note too that 1980 was the last year in which the city of residence was directly identified for households in US microdata. Since 1990, microdata provides only the public use microdata area (PUMA) in which a given household resided. PUMAs are sometimes coterminous with city boundaries, but they also frequently encompass multiple cities and occasionally straddle city boundaries. Therefore, for most cities, and even for some very large cities, it is impossible to identify the exact set of records in the microdata file that correspond to a given city. CITY infers the city of residence by assigning PUMAs to cities in which the majority of each PUMA’s population resided. A household therefore might not in fact have resided in its identified city. This protocol yields errors of omission (where a CITY code is not assigned to some residents of the corresponding city) and errors of commission (where a CITY code is assigned to some non-residents). To ensure that CITY codes are generally representative of city populations, cities are identified only where the sum of match errors is less than 10% (see the comparability tab for the PUMA CITY match summary).
I hope this information helps you figure out how to best proceed.