Can I read the .dat.gz files into R without extracting?


I have been using the IPUMS full count data (1900 to 1940) which I downloaded in .dat.gz format, and I load it into RStudio using read_ipums_micro_chunked(), and I get the sub-groups that I need using a filter function. I have not extracted the data on my PC. Do I need to extract the data before I use it in this way?


You do not need to extract the data onto your PC before using it after reading it into R with the read_ipums_micro_chunked() function. This function allows you to work with a large dataset without taking up too much RAM at one time by looking at it in chunks (see the read_ipums_micro_chunked() vignette and and this blog post on reading IPUMS data in chunks for more information). It sounds like you are going a step further and creating subsets of your data using the filter function; as long as you assign these subsets as objects, R saves them as their own dataframe in memory.

For example:

Original dataframe:

ddi ← read_ipums_ddi(“usa_00001.xml”)
data ← read_ipums_micro(ddi)

Subsetted dataframe of Minnesota cases only:

data_subset_mn ← filter(data$STATEFIP==27)

In case I mis-understood your question, I will add that .dat.gz files do not need to be uncompressed before they can be read into R using the read_ipums_micro_chunked() function.