Census tract shapefile download fails to complete

Clyde_Schwab · February 16, 2022, 8:37pm

I’ve been using the basic walkthrough of using R to download NHGIS data. I’m trying to download 2010 census tract shapefiles, but, in the final step of actually performing the download using the provided download link, the download stops at 80-90%. R returns this warning message:

Warning message:
In download.file(des_df$download_links$gis_data, zip_file, headers = c(Authorization = my_key)) :
  downloaded length 430096407 != reported length 503138568

and I’m unable to unzip the downloaded data either manually or using unzip(). I’ve tried this multiple times over multiple days, and as far as I know there’s no way to subset the download into different states. Can someone provide some insight on whats happening here?

For additional context, here’s the rest of my code:

library(tidyverse)
library(sf)
library(httr)
library(jsonlite)
library(ipumsr)

# This is my key -- a new one can be obtained from the IPUMS website
my_key <- c("MYKEY")
url <- "https://api.ipums.org/extracts/?product=nhgis&version=v1"

# writing metadata for json to be extracted
mybody <- 
'
{
  "shapefiles": [
    "us_tract_2010_tl2010"
  ],
  "description": "2010 tract shapefiles",
  "breakdown_and_data_type_layout": "single_file"
}
'

mybody_json <- fromJSON(mybody, simplifyVector = FALSE)
result <- POST(url, add_headers(Authorization = my_key), body = mybody_json, encode = "json", verbose())
res_df <- content(result, "parsed", simplifyDataFrame = TRUE)
my_number <- res_df$number

data_extract_status_res <- GET(paste0("https://api.ipums.org/extracts/", my_number, "?product=nhgis&version=v1"), add_headers(Authorization = my_key))
des_df <- content(data_extract_status_res, "parsed", simplifyDataFrame = TRUE)
des_df$download_links

# Download table data and read into a data frame
# Destination file
zip_file <- "NHGIS_2010tracts.zip"
# Download extract to destination file
download.file(des_df$download_links$gis_data, zip_file, headers=c(Authorization=my_key))
# List extract files in ZIP archive
unzip(zip_file, list=TRUE)
# Read 2000 block-group CSV file into a data frame
tracts2010 <- read_nhgis(zip_file, data_layer = contains("2000_blck_grp.csv"))

fran · February 16, 2022, 10:12pm

Hi @Clyde_Schwab and welcome to the forum! I attempted to duplicate the issue but was unable to - using your code I was able to successfully download and unzip the 2010 tract files.

When you say you tried multiple times, do you mean the entire workflow or just the download step? Perhaps somehow your extract may have gotten corrupted and generating a new extract would resolve the issue. We did have an extract generation issue yesterday morning and although it seems unlikely to cause the issue here, it’s worth a try if you haven’t already.

If that doesn’t resolve the issue, we’ll keep brainstorming.

fran · February 16, 2022, 10:22pm

I just had another thought which seems more likely - is it timing out after about a minute? The download.file function has a default timeout of 60 seconds, which can be set with the timeout option. I’m on a fast connection so this download only requires a few seconds for me, which may be why I did not see the same behavior you did.

I am not a native R programmer, but I believe you can try something like this:

options(timeout=180)

to set it to 3 minutes before you try the download.file() call.

Clyde_Schwab · February 23, 2022, 5:29pm

The timeout command totally solved this, thank you! I had been rerunning the entire workflow, and believe that R’s automatic timeout was creating this issue.

Topic		Replies	Views
Census Tract and Filtering Downloads USA	1	971	November 4, 2019
Data-file size issues? IPUMS API	5	483	June 29, 2022
Merging shape files/datasets ipumsr	9	2540	March 23, 2021
Proper format for simple API call in R IPUMS API	6	555	November 28, 2022
Download question	1	319	November 21, 2019

Census tract shapefile download fails to complete

Related topics