Ipums2db: a program for using IPUMS data in relational databases

Hi all. (I believe I have permission to post this here, but please let me know if I’m mistaken.)

I recently built a program, ipums2db, which allows you to analyze your IPUMS extracts in relational database systems. You can install the command-line tool with Homebrew, manually download your OS/architecture specific binary executable, or build and install the binary yourself with the Go compiler. Further, there are a number of supported database systems and optional flags available.

The program has two general functionalities/use-cases:

1. Generate a database schema from a DDI XML file, with reference tables for discrete variables (like “SEX” or “EMPSTAT”) included as well. Assuming you have a CSV data file, you can load in the schema/DDL file, then load in the data through your database-specific “COPY FROM <csv_file_path> …” command.

2. Assuming you have a fixed-width data file and want to load that data in as well, you can both generate the schema and convert your fixed width file to database insert blocks. This conversion step is fairly optimized at the moment, so this process shouldn’t take too long; the database insertions will take far longer of course.

Here are a few of the optional flags:

- Specify a specific database system syntax; options include Postgres, MySQL, MSSQL, and Oracle; defaults to Postgres

- Generate index creations on certain columns

- If converting a fixed-width file, you can specify a directory output format, so that there is one schema file and multiple insertion files. This can help break up a large dump into smaller steps.

I’d be happy to provide any more information on request, or listen to any feedback. Thanks!

1 Like