Download many many samples sequentially

I’m would like to start downloading datasets for many countries and years, at least over 100. I intend to use the same set of 66 variables each time.

I want to make sure I’m not breaking any rules or stressing the servers too much. My plan is, in the absence of an API, to have a “base” extract and then iteratively modify that extract for each country-year I wish to download.

I have read that it’s best to only download one country-year at a time, is this true?

What happens if I start, say, 15 downloads simultaneously? Will they work in parallel? Will they be sequential? Will I be flagged for downloading too many things too quickly?

Please let me know,

Peter Deffebach

This shouldn’t be a problem for the IPUMS servers. You can submit multiple extracts that will be processed in parallel, though there is a limit, after which new ones will be queued up. Also there is no reason to submit one country-year at a time. You can certainly include multiple countries and years in a single extract as long as your computer can handle the file size. I will also note that there is in fact an API for IPUMS International, which would be well suited to your case. It is fully operational, though not currently supported by the ipumsr or ipumspy clients. So you will need to do more of the programming yourself.

Thanks!

I have written a small wrapper around the curl API in Julia that you can find here.

It would be nice if there were ways I could cancel download requests because I just messed up submitting a qsub job and therefore I have lots of things in my queue that I won’t download.

But if the servers are used to lots of throughput I guess I feel less bad about it.

I definitely prefer making separate requests, as it makes it easier to keep track of what I have downloaded. It also seems like it’s faster to have N different samples than one combined sample. If there were an option to have separate .dta files per sample, that would be excellent.

@Matthew_Bombyk

Actually, I have a question. I need to download 50 variables for each country, but not all countries have all 50 variables.

When I try to submit a request and a country is missing a variable, my JSON object I get back tells me I have an error and the variable doesn’t exist.

There seems to be no list anywhere of which countries have which variables. Am I just not finding it? I’m afraid I can’t move forward without either a) access to this list or b) an option to ignore missing variables.

Best,

Peter

Thanks for sharing your code!

Currently there is no metadata API, so no programmatic way to determine whether a given variable is available for a given sample. If any of the samples in your request contain the variable of interest, it should work properly, so that’s one argument for including multiple samples in your extract. You can see the availability of each variable by visiting its availability tab (for example MARST) or looking at the variable groups in the extract builder site (for example the demographics group).

I will share your requests with the API team. In the future, I encourage you to post your question directly to the IPUMS API channel on the forum. While the User Support team can provide guidance on interpreting the documentation and basic troubleshooting, that channel is monitored by our IT group that developed the API and can help with more detailed questions or suggestions.