camelCase
I suppose I haven’t looked at the Forum in a while. I had no idea that IPUMS changed many of the options calls to camelCase with Version 2 of the API. Surprise! Given that change, the example below is written using Version 2 conventions.
A loop to check check for status=“completed”
Yes, I know we receive an email when the data are ready; however, I wanted to take it a step further and do that checking for me – then continue my code to download the data (and do all the post-processing [not shown in this post]).
The original code to check the status of a request is below. (Scroll down to the GET A REQUEST’S STATUS section of the page.) This assumes that you’ve already created your request body, stored your API key, etc.
url <- "https://api.ipums.org/extracts/?collection=nhgis&version=2"
result <- POST(url,
add_headers(Authorization = my_key),
body = mybody_json,
encode = "json",
verbose())
res_df <- content(result,
"parsed",
simplifyDataFrame = TRUE)
my_number <- res_df$number
# Old code to check status of request
data_extract_status_res <- GET(
"https://api.ipums.org/extracts/6?collection=nhgis&version=2",
add_headers(Authorization = my_key))
des_df <- content(
data_extract_status_res,
"parsed",
simplifyDataFrame = TRUE)
des_df
My modified code uses a while() statement and the Sys.sleep() function to postpone running a check on the status of the data request. I also use a paste0() command to capture the value passed to my_number, rather than manually typing in the value.
# New code to check status of request
data_extract_status_res <- GET(
paste0("https://api.ipums.org/extracts/",
my_number,
"?collection=nhgis&version=2"),
add_headers(Authorization = my_key))
des_df <- content(data_extract_status_res,
"parsed",
simplifyDataFrame = TRUE)
while(des_df$status!="completed"){
Sys.sleep(20)
data_extract_status_res <- GET(
paste0("https://api.ipums.org/extracts/",
my_number,
"?collection=nhgis&version=2"),
add_headers(Authorization = my_key))
des_df <- content(data_extract_status_res,
"parsed",
simplifyDataFrame = TRUE)
}
After the data are available to download, I run the lines of code from the user pages to grab the file. My further code (not shown) parses the files in preparation for analysis.
A few final notes:
- I’m sure there are places to make the code more efficient. I’m no expert in R.
- There’s also the possibility to create a function that could take arguments of the Sys.sleep() time (which I currently have set at 20 seconds), the name of the zip_file, or the directory where the downloaded data file should be stored. I haven’t done that (yet!).
- Leaving your instance of R in a while() loop while waiting for the data to be ready for download may not be ideal for you; however, I don’t mind (especially if I’m running this program at the end of the day and want to have prepared data for me once I return to my desk the next morning).