Extracting Multiple Years of Aggregate Data

I’ve written a Python script for extracting NHGIS aggregate data through the API and creating variables with consistent names across years. I wrote it because there doesn’t seem to be a convenient process for extracting multiple years of ACS tract-level data, and because, for aggregate data files, NHGIS variable names aren’t consistent over time.

Feedback welcome! I’d be especially interested to learn if there’s already some simple way to do this that I’ve missed.

The script can be found at: GitHub - ssusin270/Extract-NHGIS: Python script to extract variables from NHGIS iPUMS aggregate datasets

1 Like

Thanks for sharing this, Scott. I haven’t tried it out, but I expect it could be very handy.

It’s generally true that there isn’t another simple way of doing this right now. The exception, I’d say, is that NHGIS includes a range of ACS tract-level data in its time series tables, and those are designed to provide multiple years of data with consistent naming–including both 5-year ACS data and census data. But there are definitely lots of ACS tables that aren’t covered in our time series tables, and the time series tables don’t directly re-use ACS table codes, so if you want to get a particular ACS table for multiple years, that does require some special handling.

The other thing I’d note is that, as part of a current NHGIS grant, we plan to add a new option for users to get variable names that are consistent with ACS table names, or at least more consistent across ACS datasets. We haven’t worked out the details yet, but we hope to implement this sometime in the next year or two.