Async HTTP Request Example

Recently, I had to perform some HTTP requests in Python, mainly for FastAPI and Mistral Python SDK. This Jupyter notebook is a record of the work I did while testing the three most popular libraries, Requests, AIOHTTP and httpx for my personal projects. To simplify this example, I made use of httpbin.org to receive random bytes following each HTTP request.

First of all, we need some way of recording the execution time. I usually like to write a simple decorator that wraps the target function, and use time.perf_counter() to measure the exact time spent executing the inner function.

from rich.jupyter import print

import time

def _time(func):
    def wrapper(*args, **kwargs):
        t0 = time.perf_counter()
        res = func(*args, **kwargs)
        t1 = time.perf_counter()

        print(f"Time: {(t1 - t0):.4f} [sec]")

        return res
    return wrapper

Requests¶

import requests

Sending an HTTP request using the Requests library is rather straightforward. We construct the url as a formatted string, and send the HTTP request.

url = "https://httpbin.org/bytes/2"

with requests.Session() as session:
    response = session.get(url)
    print(int.from_bytes(response.content, "little"))

I have changed the response bytes to integer, but it is just for the sake of presentation. The response gives me some random bytes, so the actual content of the response is not important (other than that it is random).

We can also make multiple HTTP requests. However, because Requests only supports synchronous operations, there is no way around it but to do it in a loop. We will at least make use of the session persistence because that is always the right thing to do.

To do this, we will simplify the construction of the url a little bit. We can make it into a function and use list comprehension to obtain a list of 25 urls. These urls are all identical in this simplified example, but of course you can get creative as needed.

def url(num=2):
    return f"https://httpbin.org/bytes/{num}"

urls = [url() for _ in range(10)]

print(urls[:])

We can measure the total execution time to send and receive 10 HTTP requests using the following code.

@_time
def serial_req():
    l_res = []
    with requests.Session() as session:
        for _url in urls:
            response = session.get(_url)
            l_res += [int.from_bytes(response.content, "big")]

    return l_res

print(serial_req())

This is pretty slow, even for 10 simple requests. because of the synchronous nature, the code has to wait until each request is fulfilled before moving to the next iteration. For that reason, the longer the list of urls that you need to go through, the slower it will become.

This is of course not a very efficient way to obtain data from the web, but surprisingly, I found that more often than not, I had to resort to HTTP requests and had no other way to quickly obtain large chunks of data. So learning about the HTTP libraries turned out to be quite useful. Still, I would like to have a way to efficiently go through HTTP requests.

AIOHTTP¶

Instead, we can test concurrent operation using the AIOHTTP library, introduced in 2014. The library has been used for high-performance, concurrent applications, and is still widely used today. AIOHTTP library, however, is purely asynchronous and we need to change our timer slightly. The wrapper function within the decorator need to be an async function, and the target function, which is also async, needs to be awaited. Still, the changes are minimal and unobtrusive.

import aiohttp
import asyncio

def _time(func):
    async def wrapper(*args, **kwargs):
        t0 = time.perf_counter()
        res = await func(*args, **kwargs)
        t1 = time.perf_counter()

        print(f"Time: {(t1 - t0):.4f} [sec]")

        return res
    return wrapper

Making a single HTTPS request is still more or less the same.

async with aiohttp.ClientSession() as session:
    async with session.get(url()) as response:
        print(int.from_bytes(await response.read(), "little"))

To make multiple, concurrent requests, we can make use of asyncio.gather() function. To simplify the function a bit more, we can define the async fetch function that sends and receives a response from an HTTP request within an open session.

async def _fetch(session, url):
    async with session.get(url) as response:
        return int.from_bytes(await response.read(), "little")

And because creating coroutines (or futures, for that matter) are cheap, we can declare multiple instances of fetch function and construct a list of tasks that need to be performed concurrently in the future.

tasks = [_fetch(session, _url) for _url in urls]

print(tasks[:])

async with aiohttp.ClientSession() as session:
    tasks = [_fetch(session, _url) for _url in urls]
    responses = await asyncio.gather(*tasks)

    print(responses[:])

The above example works, but we can clean it up a bit more.

Upon the call on this fetch method, the HTTP session is opened as a context manager. Actually, the session will need to be persistent over the course of the function call. We can make use of this and define it as a decorator, much like what we did for the timer above. I found this is a useful way to define multiple fetch functions; the session management is more or less the same across the operations that need to be done, and can be delegated to the decorator with a few additional options.

So we define the session as a decorator that handles the context switching, and the actual fetch function just performs asyincio.gather() on the list of coroutines we defined above. The following code shows the chain of functions that are used to make concurrent HTTP requests, which I have been using as a basis in my projects.

def _session(func):
    async def wrapper(*args):
        try:
            async with aiohttp.ClientSession() as session:
                return await func(session, *args)
        except aiohttp.ClientConnectorError as e:
            print("Connection Error", str(e))

    return wrapper


async def _fetch(session, url):
    async with session.get(url) as response:
        return int.from_bytes(await response.read(), "little")


@_time
@_session
async def async_req(session):
    tasks = [_fetch(session, _url) for _url in urls]
    responses = await asyncio.gather(*tasks)

    return responses

print(await async_req())

So here it is; for concurrent HTTP requests, AIOHTTP performs well and the asynchronous nature of the library serves me well most of the times. Unless you need to make a single HTTP request and close the session, I found that AIOHTTP is always more performant than the Requests library.

So what does this all mean for httpx? I have also tested the httpx library, but in my use cases, it was not better than Requests in serial operations and AIOHTTP in concurrent requests. As concurrency tended to be important for my projects, I decided to stick with AIOHTTP for the foreseeable future. I can see the merit of httpx when mixing synchronous and asynchronous operations, but I have yet to come across the need personally yet.

Of course, the above code is a simplified version of the actual thing I wrote. There are a few considerations that usually need to be made before launching hundreds of concurrent HTTP requests. For both web applications and databases, one must consider the number of concurrent operations that the system can handle. It is also easy to get IP banned if you are not careful with the amount of requests you send. And if you need to go through a large table or a dataset through HTTP requests, there usually is a better way to go about it. However, if you run out of options, chunking alleviates a lot of the issues involved with such endeavour.

blog

List of Things

notebooks

Periodicity Estimate with Gausian Process (GP) Regression