Estimating Storage Requirements for Complete IPUMS Database

I am embarking on a project to build a system specifically designed to handle the entirety of the IPUMS databases. This includes IPUMS-USA (ACS and decennial Census data), IPUMS CPS, IPUMS International, and any other datasets that are part of the IPUMS collection.

My goal is to download all available years of data for each IPUMS dataset, including all available variables. I understand that this will result in a massive amount of data, and I am trying to estimate the storage requirements to accommodate this.

Could you please provide an estimate of the total file size for all years of uncompressed IPUMS data, with all variables included, across all the different IPUMS datasets? Additionally, do you have any recommendations for additional storage that might be required for data processing and analysis?

I appreciate your assistance and look forward to your guidance.

Thanks for your interest in IPUMS data. I don’t have a direct or definitive answer to your question; we natively use compressed formats of the data because of the massive amount of data we have. Without more clarity on your intended application of the data, I will share two general thoughts. First, given the pace at which IPUMS releases new data or incremental updates, the database you allude to would quickly become out of sync with the actual IPUMS databases; beyond missing new samples, updates to variable codes to accommodate new samples would render the two versions incompatible rather than just leaving yours with fewer variables or observations included. Second, I encourage you to read the terms of use for IPUMS data; please pay particular attention to the redistribution clauses as well as specific restrictions, particularly those around non-commercial use, for a subset of data collections.