[python] Asynchronous Requests with Python requests

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content

This question is related to python asynchronous python-requests httprequest

The answer is


If you want to use asyncio, then requests-async provides async/await functionality for requests - https://github.com/encode/requests-async


DISCLAMER: Following code creates different threads for each function.

This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.

You can use the following decorator to give a callback once the execution of function is completed, the callback must handle the processing of data returned by the function.

Please note that after the function is decorated it will return a Future object.

import asyncio

## Decorator implementation of async runner !!
def run_async(callback, loop=None):
    if loop is None:
        loop = asyncio.get_event_loop()

    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return loop.run_in_executor(None, __exec)

        return wrapper

    return inner

Example of implementation:

urls = ["https://google.com", "https://facebook.com", "https://apple.com", "https://netflix.com"]
loaded_urls = []  # OPTIONAL, used for showing realtime, which urls are loaded !!


def _callback(resp):
    print(resp.url)
    print(resp)
    loaded_urls.append((resp.url, resp))  # OPTIONAL, used for showing realtime, which urls are loaded !!


# Must provide a callback function, callback func will be executed after the func completes execution
# Callback function will accept the value returned by the function.
@run_async(_callback)
def get(url):
    return requests.get(url)


for url in urls:
    get(url)

If you wish to see which url are loaded in real-time then, you can add the following code at the end as well:

while True:
    print(loaded_urls)
    if len(loaded_urls) == len(urls):
        break

I have been using python requests for async calls against github's gist API for some time.

For an example, see the code here:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.


async is now an independent module : grequests.

See here : https://github.com/kennethreitz/grequests

And there: Ideal method for sending multiple HTTP requests over Python?

installation:

$ pip install grequests

usage:

build a stack:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

send the stack

grequests.map(rs)

result looks like

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests don't seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.


from threading import Thread

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

I have a lot of issues with most of the answers posted - they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they're 3rd party libraries or deprecated.

Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.

Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

How it works is simple. You're creating a series of tasks you'd like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.


Unfortunately, as far as I know, the requests library is not equipped for performing asynchronous requests. You can wrap async/await syntax around requests, but that will make the underlying requests no less synchronous. If you want true async requests, you must use other tooling that provides it. One such solution is aiohttp (Python 3.5.3+). It works well in my experience using it with the Python 3.7 async/await syntax. Below I write three implementations of performing n web requests using

  1. Purely synchronous requests (sync_requests_get_all) using the Python requests library
  2. Synchronous requests (async_requests_get_all) using the Python requests library wrapped in Python 3.7 async/await syntax and asyncio
  3. A truly asynchronous implementation (async_aiohttp_get_all) with the Python aiohttp library wrapped in Python 3.7 async/await syntax and asyncio
import time
import asyncio
import requests
import aiohttp

from types import SimpleNamespace

durations = []


def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print(f'{func.__name__:<30} started')
        result = func(*args, **kwargs)
        duration = f'{func.__name__:<30} finished in {time.time() - start:.2f} seconds'
        print(duration)
        durations.append(duration)
        return result
    return wrapper


async def fetch(url, session):
    """
    asynchronous get request
    """
    async with session.get(url) as response:
        response_json = await response.json()
        return SimpleNamespace(**response_json)


async def fetch_many(loop, urls):
    """
    many asynchronous get requests, gathered
    """
    async with aiohttp.ClientSession() as session:
        tasks = [loop.create_task(fetch(url, session)) for url in urls]
        return await asyncio.gather(*tasks)

@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [SimpleNamespace(**session.get(url).json()) for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    loop = asyncio.get_event_loop()
    # use session to reduce network overhead
    session = requests.Session()

    async def async_get(url):
        return session.get(url)

    async_tasks = [loop.create_task(async_get(url)) for url in urls]
    return loop.run_until_complete(asyncio.gather(*async_tasks))


@timed
def asnyc_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(fetch_many(loop, urls))


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    sync_requests_get_all(urls)
    async_requests_get_all(urls)
    asnyc_aiohttp_get_all(urls)
    print('----------------------')
    [print(duration) for duration in durations]

On my machine, this is the output:

sync_requests_get_all          started
sync_requests_get_all          finished in 30.92 seconds
async_requests_get_all         started
async_requests_get_all         finished in 30.87 seconds
asnyc_aiohttp_get_all          started
asnyc_aiohttp_get_all          finished in 3.22 seconds
----------------------
sync_requests_get_all          finished in 30.92 seconds
async_requests_get_all         finished in 30.87 seconds
asnyc_aiohttp_get_all          finished in 3.22 seconds

You can use httpx for that.

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

if you want a functional syntax, the gamla lib wraps this into get_async.

Then you can do


await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

The 10 is the timeout in seconds.

(disclaimer: I am its author)


I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

The docs are here: http://pythonhosted.org/simple-requests/


maybe requests-futures is another choice.

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

It is also recommended in the office document. If you don't want involve gevent, it's a good one.


I have also tried some things using the asynchronous methods in python, how ever I have had much better luck using twisted for asynchronous programming. It has fewer problems and is well documented. Here is a link of something simmilar to what you are trying in twisted.

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html


I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to asynchronous

How to read file with async/await properly? Use Async/Await with Axios in React.js Waiting until the task finishes How to reject in async/await syntax? React - Display loading screen while DOM is rendering? angular 2 how to return data from subscribe How do I access store state in React Redux? SyntaxError: Unexpected token function - Async Await Nodejs Why does .json() return a promise? Why is setState in reactjs Async instead of Sync?

Examples related to python-requests

Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website Use python requests to download CSV Download and save PDF file with Python requests module ImportError: No module named 'Queue' How to use cookies in Python Requests Saving response from Requests to file How to get Python requests to trust a self signed SSL certificate? How to install requests module in Python 3.4, instead of 2.7 Upload Image using POST form data in Python-requests SSL InsecurePlatform error when using Requests package

Examples related to httprequest

Http post and get request in angular 6 Postman: How to make multiple requests at the same time Adding header to all request with Retrofit 2 Understanding Chrome network log "Stalled" state Why is this HTTP request not working on AWS Lambda? Simulate a specific CURL in PostMan HTTP Request in Swift with POST method What are all the possible values for HTTP "Content-Type" header? How to get host name with port from a http or https request Why I get 411 Length required error?