Joining data on congressional districts

Hello, I downloaded a total population table 2000-present. I selected options to have it by zip code, county, and congressional district. I received a few separate files, each one exclusive of another. I was hoping to have congressional district, zip code, and county all in one spreadsheet. Is there a way to do this? I am not sure how to join them. Would greatly appreciate your guidance. Thank you.

Hi Richard,

Do you want to have the data structured so that each record is a unique combination of ZIP code, county and CD? We do not have a data file that provides information for that combination of geographic units.

It is possible to generate a dataset like that from the decennial census data. Census blocks will have codes for ZIP codes, counties and congressional districts. You can then collapse the data on those variables to create a file where each record is a unique combination of ZIP, county, and congressional district.

You wouldn’t be able to create that type of file for the American Community Survey because the ACS is not available for census blocks.

If you have questions about collapsing, let me know.

Dave Van Riper

Hello Dave, thank you for the reply. Yes, I was hoping to obtain that unique combination. A combination of zip codes-congressional district would work just as well. However, I know that a single zip code can, at times, map to several congressional districts. Also, from tables that I did download, it referenced the 106th and 107th congressional districts. If I wanted the data (zip code-congressional district combinations) based on those congressional districts, how would I go about that? The redistricting based on the 1990 census started in a subsequent congress with different states enacting redistricting during different congresses. Your guidance for how to proceed would be much appreciated.

Hi Richard,

I’ve looked at our 2000 block-level census data, and we don’t provide a ZIP code or congressional district code on those files. Thus, the way I thought you could generate the ZIP code - CD - County records will not work!

Since we don’t have the codes in the data files, the fallback methodology would be obtaining census block shapefiles and overlaying them with ZIP codes and Congressional districts. The output of these overlay operations will assign a ZIP code and CD code to each 2000 census block.

If you then download the population count for each census block, you can merge the population count on the output of the overlay operation. Then you can roll up the block file by ZIP code and CD code to generate counts for each combination of ZIP and CD.

This method is more involved because it involves the use of spatial analysis tools to generate the census block to ZIP code and census block to CD code crosswalks.


I will also check with a colleague (who’s on vacation this week) about whether we have any internal 2000 block data with the ZIP codes and CD codes on them.

Hello Dave, thank you for your response. I believe that the first congressional districts drawn based on the 2000 census did not roll out until the 108th congress. So I believe that the 106th and 107th congressional districts were still based on the 1990 census, but there is a lag to how the CDs are updated given that there is between state heterogeneity in enacting them. So it would have to be data that gave me the 106-107th congress CDs.

Also, I am not familiar with any geosoftware. I am aware that Arcgis is commonly used. Perhaps this is unanswerable, but how straight forward is loading the maps and conducting the process you describe to combine the data?

Unfortunately, it is not a straightforward process to conduct the process if you’re not familiar with geospatial software. The basic outline is:

  1. Obtain 51 census block shapefiles from 2000
  2. Obtain the CD and ZCTA shapefiles
  3. Merge them into a nationwide file
  4. Convert the blocks to centroids
  5. Overlay the centroids on the CD shapefile
  6. Overlay the centroids on the ZCTA shapefile
  7. Merge the results of (4) and (5) into a single data file
  8. Merge the results of (6) with the 2000 census block populations
  9. Roll up (8) by unique combinations of ZCTA and CD codes

You would also be handling large files because there were ~7 million census blocks in 2000.

Dave, thank you for the instructions. I will see if I can obtain ArcGIS or perhaps a free geospatial software if there is one?

Regarding which block shapefiles to grab. You mention again the year 2000, but I am worried those will not align with the 106th and 107th CDs. Do you have information on this?

There are two possible free options for you to use. QGIS is a free, open-source GIS package. You can also do spatial analysis in R, if you use that for your other stats work.

With respect to the 2000 census blocks vs. the 106th/107th CDs, I need a little more clarification about what you’re trying to do! You’re looking to create population counts for combinations of ZIP codes and CDs. Do you want the Census 2000 populations for those combinations or the Census 1990 populations for the combinations?

When the Census Bureau delineated 2000 census blocks, the blocks will follow the congressional districts that were in existence at the time (e.g., the 106th CD). They would also then nest within the 107th CDs (although as far as I am aware, the 106th and 107th CDs are exactly the same). Thus, the 2000 blocks will align perfectly with the 106th and 107th CDs.

We also have the 1990 census blocks (which would have been used as the building blocks for the 106th and 107th CDs). You will need to use those if you want to generate 1990 census-based population counts for the unique intersections.

Dave, thank you for the clarification.

I use Stata and SAS and only use R for limited tasks, so perhaps QGIS is my best bet.

Okay, as I understand you, it does sound like the 2000 census blocks would be what I want then. Actually, all I really need is the CDs-zip code matches for the 106-107th CDs. I don’t need the population data.

I’ll try to fire up QGIS and see if I can give this a go.

Hi Richard,
I’m Dave’s colleague who was on vacation last week. I hope you’ve had some luck with your QGIS exploration. In case not, I pulled together some data to simplify your work. In this zip file, there’s a fixed-width data file and a data dictionary. The data file is an extract from our internal NHGIS data for 2000 census blocks, which includes codes for the 106th-107th CD and the ZCTA where each block is located. If you sum up the block populations for each unique combo of CD and ZCTA codes, you’ll have a listing of all associations between CDs and ZCTAs, including the population of each area of intersection, which you can use as a weighting factor if you mean to allocate data from one to the other.

Jonathan, this is great! I really appreciate you both going above and beyond to help me with this. It was going to be slow going getting started up with QGIS, so this is a big help.