Can I read the .dat.gz files into R without extracting?

Rami_Zalfou · November 18, 2021, 4:28pm

Hi,

I have been using the IPUMS full count data (1900 to 1940) which I downloaded in .dat.gz format, and I load it into RStudio using read_ipums_micro_chunked(), and I get the sub-groups that I need using a filter function. I have not extracted the data on my PC. Do I need to extract the data before I use it in this way?

Thanks,
Rami

Grace_Cooper · November 24, 2021, 5:23pm

You do not need to extract the data onto your PC before using it after reading it into R with the read_ipums_micro_chunked() function. This function allows you to work with a large dataset without taking up too much RAM at one time by looking at it in chunks (see the read_ipums_micro_chunked() vignette and and this blog post on reading IPUMS data in chunks for more information). It sounds like you are going a step further and creating subsets of your data using the filter function; as long as you assign these subsets as objects, R saves them as their own dataframe in memory.

For example:

Original dataframe:

ddi ← read_ipums_ddi(“usa_00001.xml”)
data ← read_ipums_micro(ddi)

Subsetted dataframe of Minnesota cases only:

data_subset_mn ← filter(data$STATEFIP==27)

In case I mis-understood your question, I will add that .dat.gz files do not need to be uncompressed before they can be read into R using the read_ipums_micro_chunked() function.

Topic		Replies	Views
How to I read in select columns from IPUMs data with ipumsr? USA	5	1215	October 1, 2018
Is it possibel to use ipums data in R?	3	3318	February 15, 2020
R package 'ipums' can't read extract. Needs .XML file? No option for that??? CPS	1	1149	January 29, 2018
Hey Red, What's the easiest way to read my IPUMS USA data into R? USA	1	423	October 20, 2015
Problems with subsetting replicate weight variables using ipumsr::read_ipums_micro_chunked CPS	6	597	February 11, 2019

Can I read the .dat.gz files into R without extracting?

Related topics