Camel case, and a better way to "wait for data"

camelCase

I suppose I haven’t looked at the Forum in a while. I had no idea that IPUMS changed many of the options calls to camelCase with Version 2 of the API. Surprise! Given that change, the example below is written using Version 2 conventions.

A loop to check check for status=“completed”

Yes, I know we receive an email when the data are ready; however, I wanted to take it a step further and do that checking for me – then continue my code to download the data (and do all the post-processing [not shown in this post]).

The original code to check the status of a request is below. (Scroll down to the GET A REQUEST’S STATUS section of the page.) This assumes that you’ve already created your request body, stored your API key, etc.

url <- "https://api.ipums.org/extracts/?collection=nhgis&version=2"
result <- POST(url, 
               add_headers(Authorization = my_key), 
               body = mybody_json, 
               encode = "json", 
               verbose())
res_df <- content(result, 
                  "parsed", 
                  simplifyDataFrame = TRUE)
my_number <- res_df$number

# Old code to check status of request
data_extract_status_res <- GET(
  "https://api.ipums.org/extracts/6?collection=nhgis&version=2", 
  add_headers(Authorization = my_key))
des_df <- content(
  data_extract_status_res, 
  "parsed", 
  simplifyDataFrame = TRUE)
des_df

My modified code uses a while() statement and the Sys.sleep() function to postpone running a check on the status of the data request. I also use a paste0() command to capture the value passed to my_number, rather than manually typing in the value.

# New code to check status of request
data_extract_status_res <- GET(
  paste0("https://api.ipums.org/extracts/",
         my_number,
         "?collection=nhgis&version=2"), 
  add_headers(Authorization = my_key))
des_df <- content(data_extract_status_res, 
                  "parsed", 
                  simplifyDataFrame = TRUE)
while(des_df$status!="completed"){
  Sys.sleep(20)
  data_extract_status_res <- GET(
    paste0("https://api.ipums.org/extracts/",
           my_number,
           "?collection=nhgis&version=2"), 
    add_headers(Authorization = my_key))
  des_df <- content(data_extract_status_res, 
                    "parsed", 
                    simplifyDataFrame = TRUE)
  }

After the data are available to download, I run the lines of code from the user pages to grab the file. My further code (not shown) parses the files in preparation for analysis.

A few final notes:

  • I’m sure there are places to make the code more efficient. I’m no expert in R.
  • There’s also the possibility to create a function that could take arguments of the Sys.sleep() time (which I currently have set at 20 seconds), the name of the zip_file, or the directory where the downloaded data file should be stored. I haven’t done that (yet!).
  • Leaving your instance of R in a while() loop while waiting for the data to be ready for download may not be ideal for you; however, I don’t mind (especially if I’m running this program at the end of the day and want to have prepared data for me once I return to my desk the next morning).

Thanks for sharing this handy workflow! In fact, it’s handy enough that we provide some similar options for creating and monitoring extract requests in the ipumsr library, which contains several tools for interacting with the IPUMS API. You can define extract requests, submit them for processing, wait for their completion, and download them without having to worry about some of the API handling going on under the hood.

A sample workflow when using ipumsr might be:

# Install ipumsr if it's not already installed:
# install.packages("ipumsr")

library(ipumsr)

# Set your API key for use throughout your R session (or save for future sessions)
set_ipums_api_key("insert-your-api-key-here")

# Use a `define_extract_*()` function to specify the parameters of an extract request:
my_extract <- define_extract_usa(
  description = "Demo extract",
  samples = "us2013a",
  variables = c("SEX", "AGE", "YEAR")
)

# Submit extract request to the IPUMS API for processing
submitted_extract <- submit_extract(my_extract)

# Wait until the extract request is complete
completed_extract <- wait_for_extract(submitted_extract)

# Download the completed extract's files
file <- download_extract(completed_extract)

# Load data
read_ipums_micro(file)

By default, wait_for_extract() will add time to each wait interval to prevent excess requests for very large extracts, but you can adjust these parameters yourself if you like. Take a look at the documentation in R by using ?wait_for_extract.

A note on API versions: ipumsr currently still supports only version 1 of the IPUMS API, but version 2 support is in active development, so stay tuned for updates coming in the near future! Using the library will also help you avoid having to deal with some of the details you mention regarding snake_case vs. camelCase, as our plan is to stick to snake_case for the ipumsr interface.

The package contains plenty of other useful tools (both API-related and not) as well. You can find more information about the current version of ipumsr on its website here.

1 Like

Great to hear. Thanks, @Finn_Roberts!

I was revisiting this recently, and was curious if there are plans to add support for other IPUMS collections beyond the USA and CPS.

Is there a workaround to get something similar to define_extract_usa() or define_extract_cps() for NHGIS (i.e. define_extract_nhgis() )?

Hi John,

Yes, the team is working on bringing ipumsr up to support for IPUMS API v2 (which adds IPUMS International) and adding NHGIS support at the same time. That will hopefully be released next month. If you are really interested into digging into code, you can preview progress on the api-v2 branch of the repo (GitHub - ipums/ipumsr at api-v2).

More generally, there are plans to add additional collections beyond what the IPUMS API already supports (e.g. Health Surveys, Time Use) but that support needs to be added into the core API first, and then support added to ipumsr, so that is farther away (possibly not until sometime in 2024).

Thanks, Fran.

Yes, I was able to get the updated version with the NHGIS command from GitHub. Seems to work quite well!