How to I read in select columns from IPUMs data with ipumsr?

I’m trying to read in only select columns from an IPUMs USA dataset with the ipumsr R package. The following code is not working. It reads in all the columns in the dataset. Any ideas?

test <- read_ipums_micro(ipumsi_ddi_file, vars = c(YEAR,MET2013,PERWT), verbose = FALSE,n_max=10)

The code you have should work. I would just make sure that “ipumsi_ddi_file” corresponds to the ddi file of your IPUMS USA extract. If the issue persists, you could use the following to create a subset of the full data set with your columns of interest:

data <- subset(test, select = c(YEAR,MET2013,PERWT)

Can you check to see if the following works for reading in a subset of the example extract included in the ipumsr package?

library(ipumsr)
read_ipums_micro(
ipums_example(“cps_00006.xml”),
vars = c(YEAR, SERIAL),
n_max = 10
)

If not, you should try reinstalling ipumsr from cran using: install.packages(‘ipumsr’)

Hi Michelle,

Your code works fine for me so the issue seems to be the particular DDI file that I am using (ie. test below has 55 columns while test2 has 2 columns). I am trying to limit the number of columns for memory considerations on the initial load, otherwise a select or subset statment like you suggested would work fine.

The other thing that would be helpful (given memory limitations) is if I could read in only a specific selection of rows data at a time (such as one metropolitan area or state) but I’m not sure how to do that within ipumsr functions.

test <- read_ipums_micro(ipumsi_ddi_file,
vars = c(YEAR, SERIAL),
n_max = 10)

test2 <-
read_ipums_micro(
ipums_example(“cps_00006.xml”),
vars = c(YEAR, SERIAL),
n_max = 10
)

Hi there,

This appears to be a bug in ipumsr - currently you cannot select columns from a csv file. As an immediate fix, you could change your extract to be a fixed width (.dat) file instead of a csv.

Otherwise, I hope to fix this in ipumsr soon. You can track progress here: https://github.com/mnpopcenter/ipumsr…

As for subsetting large extracts by row, you can use the Select Cases feature in our extract engine. I would like to add better support into ipumsr, but that is probably further off in the future.

https://cps.ipums.org/cps-action/faq#…

Thanks for reporting!

Greg

Hi Greg,

Thanks for making a ticket and for the suggestion on using a .dat file. Also, to clarify the reason why I was interested in reading in a selection of data from the data extract using ipumsr instead of requesting a smaller data extract is that I am analyzing ~40 metropolitan statistical areas (MSA) so it would be a bit of a pain to request them in separate data extracts.

However, if I had the ability to download the entire ~40 MSA dataset but only read in one MSA at a time into memory with ipumsr, then I could easily iterate through this dataset and not run up against memory limitations on a laptop. For the time being I think I will just use a smaller data extract for testing and then run the entire ~40 MSA dataset on a desktop with more memory when I have my code in a good state.

Thanks for your help, Greg and Michelle.

Best,

Jesse

Returning to this old question to mention some brand new functionality in ipumsr.

The newest version of ipumsr (released over the weekend to CRAN) has a function read_ipums_micro_chunked() which allows for working with data in chunks without having to store the whole thing in memory. This could allow for filtering out the unwanted MSAs as you describe. For more details, see the “big data” vignette.