Error in R reading data: "Line is too short for rectype."

Hi there,

I’m facing an error after running the provided code:

ddi <- read_ipums_ddi("ipumsi_00006.xml")
data <- read_ipums_micro(ddi)

"Error: Line is too short for rectype."

Interestingly, I do not encounter this error when executing the same code on a different computer. However, I need this code to work on a computer with larger RAM, which has proven to be challenging so far. I suspect that it might be related to package conflicts, but I haven’t been able to identify the solution to fix this error yet. Also, this error comes usually when the data to be loaded is very large…

Any assistance or insights into resolving this issue would be greatly appreciated!

Hi Rodrigo, thanks for posting your issue! With the help of our IT folks here, I created an extract matching the specifications of your IPUMSI extract #6, but I wasn’t able to replicate your error with the latest version of ipumsr (0.6.0).

One possible reason for the error you’re seeing is that the data file didn’t finish downloading and you have an incomplete data file on the computer where you’re getting the error. This seems plausible given the size of your extract – it looks like the extracted data contain 453 million records! You could test this theory by redownloading the extract data file, or copying the version of the data file from the computer where you are able to read the data to the computer where it’s not working.

If that approach doesn’t solve the problem, the error could result from something specific to your R installation. In that case, could you share the output of devtools::session_info() so that we can try to replicate your setup to trace the source of the issue?

1 Like

Hi @Derek_Burk, I have the same issue. I have tried loading the data on different computers and reducing my sample size to one year only, but I still have the same issue. Any thoughts on what might be happening?

My session info on computer 1:

─ Session info ─
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22631)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/Los_Angeles
 date     2024-04-17
 rstudio  2022.07.2+576 Spotted Wakerobin (desktop)
 pandoc   NA

─ Packages ─
 package     * version date (UTC) lib source
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.1)
 cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.1)
 callr         3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
 cli           3.5.0   2022-12-20 [1] CRAN (R 4.2.2)
 crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.1)
 DBI           1.1.3   2022-06-18 [1] CRAN (R 4.2.1)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.2)
 digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.1)
 dplyr         1.0.10  2022-09-01 [1] CRAN (R 4.2.2)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.1)
 fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.1)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.1)
 forcats       0.5.1   2021-01-27 [1] CRAN (R 4.2.1)
 fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.1)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.1)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.1)
 haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.1)
 hipread       0.2.4   2023-11-30 [1] CRAN (R 4.2.3)
 hms           1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
 htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
 htmlwidgets   1.5.4   2021-09-08 [1] CRAN (R 4.2.1)
 httpuv        1.6.7   2022-12-14 [1] CRAN (R 4.2.2)
 ipumsr      * 0.7.2   2024-03-12 [1] CRAN (R 4.2.3)
 janitor       2.1.0   2021-01-05 [1] CRAN (R 4.2.1)
 later         1.3.0   2021-08-18 [1] CRAN (R 4.2.2)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
 lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.1)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.1)
 mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
 pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
 pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.2.2)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.1)
 pkgload       1.3.2   2022-11-16 [1] CRAN (R 4.2.2)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.2.1)
 processx      3.7.0   2022-07-07 [1] CRAN (R 4.2.1)
 profvis       0.3.7   2020-11-02 [1] CRAN (R 4.2.2)
 promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2)
 ps            1.7.1   2022-06-18 [1] CRAN (R 4.2.1)
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.1)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.1)
 Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.2.1)
 readr         2.1.2   2022-01-30 [1] CRAN (R 4.2.1)
 remotes       2.4.2   2021-11-30 [1] CRAN (R 4.2.1)
 rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
 rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.1)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
 shiny         1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
 snakecase     0.11.0  2019-05-25 [1] CRAN (R 4.2.1)
 stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
 stringr       1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
 tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.1)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.2)
 usethis       2.1.6   2022-05-25 [1] CRAN (R 4.2.2)
 utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.1)
 vctrs         0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
 xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.1)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.2)
 zeallot       0.1.0   2018-01-28 [1] CRAN (R 4.2.3)

 [1] C:/Users/<me>/AppData/Local/R/win-library/4.2
 [2] C:/Program Files/R/R-4.2.1/library

and computer 2:

- Session info --------------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 4.2.2 (2022-10-31 ucrt)
 os       Windows Server x64 (build 17763)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.1252
 ctype    English_United States.1252
 tz       America/Los_Angeles
 date     2024-04-17
 rstudio  2022.07.2+576 Spotted Wakerobin (desktop)
 pandoc   NA

- Packages ------------------------------------------------------------------------------------------------------------------
 package     * version date (UTC) lib source
 cachem        1.0.8   2023-05-01 [1] CRAN (R 4.2.3)
 cli           3.6.2   2023-12-11 [1] CRAN (R 4.2.3)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.3)
 digest        0.6.35  2024-03-11 [1] CRAN (R 4.2.3)
 dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.2.3)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.3)
 fansi         1.0.6   2023-12-08 [1] CRAN (R 4.2.3)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.3)
 forcats       1.0.0   2023-01-29 [1] CRAN (R 4.2.3)
 fs            1.6.3   2023-07-20 [1] CRAN (R 4.2.3)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.3)
 glue          1.7.0   2024-01-09 [1] CRAN (R 4.2.3)
 haven         2.5.4   2023-11-30 [1] CRAN (R 4.2.3)
 hipread       0.2.4   2023-11-30 [1] CRAN (R 4.2.3)
 hms           1.1.3   2023-03-21 [1] CRAN (R 4.2.3)
 htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.2.3)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.2.3)
 httpuv        1.6.15  2024-03-26 [1] CRAN (R 4.2.3)
 ipumsr      * 0.7.2   2024-03-12 [1] CRAN (R 4.2.3)
 later         1.3.2   2023-12-06 [1] CRAN (R 4.2.3)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.2.3)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.3)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.3)
 mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.3)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
 pkgbuild      1.4.4   2024-03-17 [1] CRAN (R 4.2.3)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.3)
 pkgload       1.3.4   2024-01-16 [1] CRAN (R 4.2.3)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.2.3)
 promises      1.3.0   2024-04-05 [1] CRAN (R 4.2.3)
 purrr         1.0.2   2023-08-10 [1] CRAN (R 4.2.3)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.3)
 Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.2.3)
 readr         2.1.5   2024-01-10 [1] CRAN (R 4.2.3)
 remotes       2.5.0   2024-03-17 [1] CRAN (R 4.2.3)
 rlang         1.1.3   2024-01-10 [1] CRAN (R 4.2.3)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.3)
 shiny         1.8.1.1 2024-04-02 [1] CRAN (R 4.2.3)
 stringi       1.8.3   2023-12-11 [1] CRAN (R 4.2.3)
 stringr       1.5.1   2023-11-14 [1] CRAN (R 4.2.3)
 tibble        3.2.1   2023-03-20 [1] CRAN (R 4.2.3)
 tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.2.3)
 tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.2.3)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.3)
 usethis       2.2.3   2024-02-19 [1] CRAN (R 4.2.3)
 utf8          1.2.4   2023-10-22 [1] CRAN (R 4.2.3)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.2.3)
 xml2          1.3.6   2023-12-04 [1] CRAN (R 4.2.3)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.3)
 zeallot       0.1.0   2018-01-28 [1] CRAN (R 4.2.3)

 [1] C:/Users/<me>/AppData/Local/R/win-library/4.2
 [2] C:/Program Files/R/R-4.2.2/library

I tested both the largest and smallest samples on each computer. Thanks!

Hi Sarah, thanks for posting your issue! Without more details, I don’t have a good idea of what’s going on, but I’d be happy to help investigate.

Is this an IPUMS USA extract that’s having this issue, or an extract from a different IPUMS project?

Hi Derek, it’s from IPUMS CPS. Thanks!

Okay got it, thanks! The reason I ask is that I wanted to suggest using the IPUMS API to try to debug your issue, and luckily, IPUMS CPS is one of the projects that has API support! If you don’t mind doing a little setup, using the API will be helpful in double-checking that all your extract files are being downloaded fully and correctly, and in allowing me to reproduce your extract.

To use the API, you first need to grab and store your API key by following the instructions on the ipumsr website here. Make sure you’ve called library(ipumsr) before calling set_ipums_api_key().

If that goes smoothly, you’ll next want to get the extract number of one of the extracts that is failing to load. Then you can try running this code – filling in your extract number in place of <num> – to resubmit the extract, wait for it to process, then download and load the data into R:

library(ipumsr)

my_data <- get_extract_info("cps:<num>") |>
  submit_extract() |>
  wait_for_extract() |>
  download_extract() |>
  read_ipums_micro()

If that doesn’t work, please post the error you get back here. You can also use the API to share your extract definition with me so that I can try to reproduce your issue. To create a JSON file that you can upload to this thread on the forum, run (changing the file path as you see fit):

library(ipumsr)

get_extract_info("cps:<num>") |>
  save_extract_as_json(file = "my_extract.json")

If you upload the resulting JSON file here, I can use the API to submit a matching extract and see if I run into the same issues.

One caveat is that if your extract uses the “Adjust monetary values” feature available for CPS, the API won’t be able to reproduce that extract exactly.

Let me know if you run into any issues trying to follow these steps – I might have made a mistake or not explained something clearly.

Alternatively, if you’d rather not use the API, you could just share your extract number(s) with me and our IT staff can pull up the extract definitions for me.

1 Like

Thanks so much for the quick reply. The first code chunk, using the API, worked great! I don’t think I need anything else on my end, but if you’re curious about the issue I’m happy to share my extract info.

Great! No need to share more info unless you run into a similar issue again. Hopefully it was just something transitory, like the data file not downloading completely.

1 Like

Hello Sarah,

If I remember correctly, the issue "[Error in R reading data: “Line is too short for rectype.” was solved on my end by using another computer with a larger RAM – so I think it was matter of computer power.