Dear Folks
I am having some difficulty in trying to cut my extracts containing replicate weights up into more manageable chunks, using read_ipums_micro_chunked, e.g. this:
PR_RE1 <- read_ipums_micro_chunked(read_ipums_ddi(
“./CPS_1962-2018/cps_00177.xml”), IpumsDataFrameCallback$new(f),
vars = REPWT1:REPWT40)
Gave me the following error:
Error: Error in read_tokens_chunked_(data, callback, chunk_size, tokenizer, col_specs, : Evaluation error: values must be length 1,
but FUN(X[[1]]) result is length 2.
Getting this to work is my top priority.
But what I would really like to do ultimately is something more like this. I am confident that this is not only inelegant but also broken in multiple ways, but I hope it conveys what I am trying to do:
peel <- function(ddi_names, vars_lst, human_names, suf =
seq_along(vars_lst[[1]]), path =“./” ){
ddi_paths <- paste0(ddi_names, path)
if (!(length(ddi_names) == length(human_names))) stop (
“Length of ddi_names and human_names must be equal”)
args1 <- tibble(ddi_names, human_names)
for(i in seq_along(ddi_names)){
args2 <- list(nms = list(paste0(args1$human_names[i], suff, path)),
vars_lst[[i]])
for (j in seq_along(vars_lst[[i]])){
saveRDS(read_ipums_micro_chunked(
read_ipums_ddi(paste0(path, ddi_names[[i]], “.xml”)),
IpumsDataFrameCallback$new(f),
vars = args2$vars_lst[[j]]),
file = args2$human_names[[j]])
}
}
}
Which is to say, for each replicate weights variable set, read, convert, reassemble, and save as an RDS file a bunch of small subsets of the replicate weights variables, each with a distinct name:
peel(ddi_names = c(“cps000177”, “cps000180”),
vars_lst = list(HH_REPS = list(REPWT1:REPWT40, REPWT41:REPWT80,
REPWT81:REPWT120, REPWT120:REPWT160),
PR_REP = list(REPWTP1:REPWTP40, REPWTP41:REPWTP80,
REPWTP81:REPWTP120, REPWTP120:REPWTP160)),
human_names = c(“HH_RE”, “PR_RE”))
This approach is a bit of a Rube Goldberg contraption, and I bet you have some two-line way of doing this with ipumsr. Well, maybe longer for the distinct names.