Census tract shapefile download fails to complete

I’ve been using the basic walkthrough of using R to download NHGIS data. I’m trying to download 2010 census tract shapefiles, but, in the final step of actually performing the download using the provided download link, the download stops at 80-90%. R returns this warning message:

Warning message:
In download.file(des_df$download_links$gis_data, zip_file, headers = c(Authorization = my_key)) :
  downloaded length 430096407 != reported length 503138568

and I’m unable to unzip the downloaded data either manually or using unzip(). I’ve tried this multiple times over multiple days, and as far as I know there’s no way to subset the download into different states. Can someone provide some insight on whats happening here?

For additional context, here’s the rest of my code:

library(tidyverse)
library(sf)
library(httr)
library(jsonlite)
library(ipumsr)

# This is my key -- a new one can be obtained from the IPUMS website
my_key <- c("MYKEY")
url <- "https://api.ipums.org/extracts/?product=nhgis&version=v1"

# writing metadata for json to be extracted
mybody <- 
'
{
  "shapefiles": [
    "us_tract_2010_tl2010"
  ],
  "description": "2010 tract shapefiles",
  "breakdown_and_data_type_layout": "single_file"
}
'

mybody_json <- fromJSON(mybody, simplifyVector = FALSE)
result <- POST(url, add_headers(Authorization = my_key), body = mybody_json, encode = "json", verbose())
res_df <- content(result, "parsed", simplifyDataFrame = TRUE)
my_number <- res_df$number

data_extract_status_res <- GET(paste0("https://api.ipums.org/extracts/", my_number, "?product=nhgis&version=v1"), add_headers(Authorization = my_key))
des_df <- content(data_extract_status_res, "parsed", simplifyDataFrame = TRUE)
des_df$download_links

# Download table data and read into a data frame
# Destination file
zip_file <- "NHGIS_2010tracts.zip"
# Download extract to destination file
download.file(des_df$download_links$gis_data, zip_file, headers=c(Authorization=my_key))
# List extract files in ZIP archive
unzip(zip_file, list=TRUE)
# Read 2000 block-group CSV file into a data frame
tracts2010 <- read_nhgis(zip_file, data_layer = contains("2000_blck_grp.csv"))

Hi @Clyde_Schwab and welcome to the forum! I attempted to duplicate the issue but was unable to - using your code I was able to successfully download and unzip the 2010 tract files.

When you say you tried multiple times, do you mean the entire workflow or just the download step? Perhaps somehow your extract may have gotten corrupted and generating a new extract would resolve the issue. We did have an extract generation issue yesterday morning and although it seems unlikely to cause the issue here, it’s worth a try if you haven’t already.

If that doesn’t resolve the issue, we’ll keep brainstorming.

I just had another thought which seems more likely - is it timing out after about a minute? The download.file function has a default timeout of 60 seconds, which can be set with the timeout option. I’m on a fast connection so this download only requires a few seconds for me, which may be why I did not see the same behavior you did.

I am not a native R programmer, but I believe you can try something like this:

options(timeout=180)

to set it to 3 minutes before you try the download.file() call.

The timeout command totally solved this, thank you! I had been rerunning the entire workflow, and believe that R’s automatic timeout was creating this issue.

1 Like