subscribe

Performance testing HTTP/1.1 vs HTTP/2 vs HTTP/2 + Server Push for REST APIs

When building web services, a common wisdom is to try to reduce the number of HTTP requests to improve performance.

There are a variety of benefits to this, including less total bytes being sent, but the predominant reason is that traditionally browsers will only make 6 HTTP requests in parallel for a single domain. Before 2008, most browsers limited this to 2.

When this limit is reached, it means that browsers will have to wait until earlier requests are finished before starting new ones. One implication is that the higher the latency is, the longer it will take until all requests finish.

Take a look at an example of this behavior. In the following simulation we’re fetching a ‘main’ document. This could be the index of a website, or a some JSON collection.

After getting the main document, the simulator grabs 99 linked items. These could be images, scripts, or other documents from an API.

100 requests via HTTP/1.1

Loading Received

HTTP/1.1 is limited to 6 concurrent requests. The big box is the initial index or collection.

The 6 connection limit has resulted in a variety of optimization techniques. Scripts are combined and compressed, graphics are often combined into ‘sprite maps’.

The limit and ‘cost’ of a single HTTP connection also has had an effect on web services. Instead of creating small, specific API calls, designers of REST (and other HTTP-) services are incentivized to pack many logical ‘entities’ in a single HTTP request/response.

For example, when an API client needs a list of ‘articles’ from an API, usually they will get this list from a single endpoint instead of fetching each article by its own URI.

The savings are massive. The following simulation is similar to the last, except now we’ve combined all entities in a single request.

Compounding 100 items in a collection

Loading Received

Combining many logical entities in 1 bulky response has major speed benefits.

If an API client needs a specific (large) set of entities from a server, in order to reduce HTTP requests, API developers will be compelled to either build more API endpoints, each to give a result that is tailored to the specific use-case of the client or deploy systems that can take arbitrary queries and return all the matching entities.

The simplest form of this is perhaps a collection with many query parameters, and a much more complex version of this is GraphQL, which effectively uses HTTP as a pipe for its own request/response mechanism and allows for a wide range of arbitrary queries.

Drawbacks of compounding documents

There’s a number of drawbacks to this. Systems that require compounding of entities typically need additional complexity on both server and client.

Instead of treating a single entity as some object that has a URI, which can be fetched with GET and subsequently cached, a new layer is required on both server and client-side that’s responsible for teasing these entities apart.

Re-implementing the logic HTTP already provides also has a nasty side-effect that other features from HTTP must also be reimplemented. The most common example is caching.

On the REST-side of things, examples of compound-documents can be seen in virtually any standard. JSON:API, HAL and Atom all have this notion.

If you look at most full-featured JSON:API client implementations, you will usually see that these clients often ship with some kind of ‘entity store’, allowing it to keep track of which entities it received, effectively maintaining an equivalent of a HTTP cache.

Another issue is that for some of these systems, is that it’s typically harder for clients to just request the data they need. Since they are often combined in compound documents it’s all-or-nothing, or significant complexity on client and server (see GraphQL).

A more lofty drawback is that API designers may have trended towards systems that are more opaque, and are no longer part of the web of information due to a lack that interconnectedness that linking affords.

HTTP/2 and HTTP/3

HTTP/2 is now widely available. In HTTP/2 the cost of HTTP requests is significantly lower. Whereas with HTTP/1.1 it was required to open 1 TCP connection per request, with HTTP/2 1 connection is opened per domain. Many requests can flow through them in parallel, and potentially out of order.

100 parallel requests via HTTP/2

Loading Received

HTTP/2 can fire off many parallel requests over 1 TCP connection

Instead of delegating parallelism to compound documents, we can now actually rely on the protocol itself to handle this.

Using many HTTP/2 requests instead of compound HTTP/1.1 requests has many advantages:

  • It’s no longer required for (browser) applications to tease out many entities from a single response. Everything can just be fetched with GET. Instead of collections embedding their items, they can just point to them.
  • If a browser has a cached copy of (some of) the items in a collection, it can intelligently skip the request or quickly get a 304 Not Modified back.
  • It’s possible for some items to arrive faster than others, if they were done earlier. Allowing interfaces to render items as they arrive, instead of waiting for everything to arrive at once.

HTTP/2 Push

There are still benefits that combined requests have over many responses.

Let’s use a real example. We’re building a blog api that has a list of articles. When we request the list, instead of returning every article we’re now just returning a list of links:

GET /articles HTTP/1.1
Host: api.example.org
HTTP/1.1 200 OK
Content-Type: application/json

{
  "_links": {
    "item": [
      { "href": "/articles/1" },
      { "href": "/articles/2" },
      { "href": "/articles/3" },
      { "href": "/articles/4" },
      { "href": "/articles/5" },
      { "href": "/articles/6" },
      { "href": "/articles/7" }
    ]
  },
  "total": 7
}

For a client to get the full list of articles, it first needs to fetch the collection, wait for the response and then fetch every item in parallel. This doubles the latency.

Another issue is that the server now needs to process 8 requests. One for the collection, and then 1 per item. It’s often much cheaper to generate the entire list at once. This is sometimes referred to as the N+1 Query problem.

The problem might potentially be eliminated with HTTP/2 Server Push. Server Push is a new feature in HTTP/2 that allows the server to take the initiative to send additional responses before the client has actually requested them.

HTTP/2 Server Push

Loading Received Received via HTTP/2 Server Push

With HTTP/2 Server Push, entities can arrive earlier because the client doesn't have to wait

Unfortunately this method also has a drawback. The server does not know what resources a client already has cached. It can only assume it must send everything, or try to intelligently guess what they might need.

There was a proposal in the works to resolve this, by letting the browser inform the server of their cache via a bloom filter. I believe this is unfortunately now abandoned.

So you can either fully eliminate the initial latency, or you can have a reduced amount of traffic due to caching, but not both.

The ideal might be a mixture of this. I’ve been working on a specification for allowing HTTP clients to specify what link relationships they would like to receive via a HTTP header. It’s called Prefer Push, and a request looks a little bit like this:

GET /articles HTTP/2
Prefer-Push: item
Host: api.example.org

If a server supports this header, it knows that the client will want all the linked resources with the ‘item relationship’ and start pushing them as early as possible.

On the server-side, a fictional controller in a fictional framework might handle this request as follows:

function articlesIndex(request, response, connection) {

  const articles = articleServer.getIndex();
  response.body = articles.toLinks();

  if (request.prefersPush('item')) {

    for(const article of articles) {
      connection.push(
        article.url,
        article.toJson();
      };
    }

}

The CORS problem

A major drawback that’s worth pointing out, is CORS. CORS originally opened the door to making it easier to do HTTP requests from a web application that’s hosted on some domain, to an API hosted on another domain.

It does this with a few different facilities, but one that specifically kills performance is the preflight request.

When doing ‘unsafe’ cross-domain requests, the browser will start off by doing an OPTIONS request, allowing the server to explicitly opt-in to requests.

In practice most API requests are ‘unsafe’. The implication is that the latency of each individual HTTP request at least doubles.

Every request needs an OPTIONS pre-flight

Loading Received OPTIONS response Received

The lack of a domain-wide cross-domain policy slows everything down.

What’s interesting is that Macromedia Flash also had this issue, and they solved with by creating a domain-wide cross-origin request policy. All you had to do is create a crossdomain.xml file on the root of your domain, and once Flash read the policy it would remember it.

Every few months I search to see if someone is working on a modern version of this for Javascript, and this time I’ve found a W3C Draft Specification. Here’s hoping browser vendors pick this up!

A less elegant workaround to this, is to host a ‘proxy script’ on the API’s domain. Embedded via an <iframe>, it has unrestricted access to its own ‘origin’, and the te parent web application can communicate to it via window.postMessage().

The perfect world

In a perfect world, HTTP/3 is already widely available, improving performance even further, browsers have a standard mechanism to send cache digests, clients inform the server of the link-relationships they want, allowing API servers to push any resources clients may need, as early as possible, and domain-wide origin policies are a thing.

This last simulation show an example of how that might look like. In the below example the browser has a warmed up cache, and an ETag for every item.

When doing a request to find out if the collection has new entries or updated items, the client includes a cache digest and the server responds by pushing just the resources that have changed.

Push, Cache Digests and a warm cache

Cached Loading Received Received via HTTP/2 Server Push

The client has a generic way to inform the server of their state. The server pushes just the things that the client needs.

Real-world performance testing

We’re lacking some of these ‘perfect world’ features, but we can still work with what we got. We have access to HTTP/2 Server Push, and requests are cheap.

Since HTTP/2, ‘many, small HTTP endpoints’ felt to me like it was the most elegant design, but does the performance hold up? Some evidence could really help.

My goal for this performance test is fetch a collection of items in the following different ways:

  1. h1 - Individual HTTP/1.1 requests
  2. h1-compound - A HTTP/1.1 compound collection.
  3. h2 - Individual HTTP/2 requests
  4. h2-compound - HTTP/2 compound collection.
  5. h2-cache - A HTTP/2 collection + every item individually fetched. Warm cache.
  6. h2-cache-stale - A HTTP/2 collection + every item individually fetched, Warm cache but needs revalidation.
  7. h2-push - HTTP/2, no cache, but every item is pushed.

My prediction

In theory, the same amount of information is sent and work is done for a compound request vs. HTTP/2 pushed responses.

However, I think there’s still enough overhead to HTTP requests in HTTP/2 that doing compound requests probably still have a leg up.

The real benefit will show when caching comes in to play. For a given collection in a typical API I think it’s fair to assume that many items may be cached.

It seems logical to assume that the tests that skip 90% of the work are also the fastest.

So from fastest to slowest, this is my prediction.

  1. h2-cache - A HTTP/2 collection + every item individually fetched. Warm cache.
  2. h2-cache-stale - A HTTP/2 collection + every item individually fetched, Warm cache but needs revalidation.
  3. h2-compound - HTTP/2 compound collection.
  4. h1-compound - A HTTP/1.1 compound collection.
  5. h2-push - HTTP/2, no cache, but every item is pushed.
  6. h2 - Individual HTTP/2 requests
  7. h1 - Individual HTTP/1.1 requests

First test setup and initial observations

I initially started testing with a local Node.js service, version 12. All HTTP/1.1 tests are done over SSL, and HTTP/2 tests run over a different port.

To simulate latency, I added a delay to every HTTP request between 40 and 80 milliseconds.

Here’s how my first testing tool looks like:

Testing HTTP/2 with cache

I ran into a number of issues right away. Chrome disables the cache with self- signed certificates. I was not really able to figure out how to get Chrome to accept my self-signed certificate on localhost, so I initially gave up on this and tested with Firefox.

On Firefox, Server Push seems unreliable. It often only worked the second time I ran the Push test.

But, the most surprising thing was that in Firefox, serving items from the local cache was only marginally faster than serving fresh responses with my artificial latency. Running these tests several times, in many cases serving items from cache was actually slower than going to the server and requesting a new copy.

Given these results, I had to improve my test setup.

Better tests

This is the basic set-up for the second test:

  1. I’m repeating tests 50 times.
  2. I’m running the server AWS, t2.medium instance in us-west-2.
  3. My testing is done over residential internet. The fake latency has been removed.
  4. I’m using LetsEncrypt SSL certificates.
  5. For each browser I’m running the test twice:
    • Once with a collection containing 25 items
    • A second collection containing 500 items.

Test 1: 25 requests

Relative time for browsers fetching a collection and 25 entities

A few things are interesting in this graph.

First, we expected HTTP/1.1 separate requests to be the slowest, so no surprise there. It’s really there to provide a baseline.

The second slowest is individual HTTP/2 requests.

This only gets marginally improved by HTTP/2 push or caching.

Chrome and Firefox mostly have the same results. Let’s zoom in on the Chrome results:

Test Median Time %
h1,no cache 0.490 100%
h1,compound 0.147 30%
h1,90% cached 0.213 43%
h2,no cache 0.276 56%
h2,compound 0.147 30%
h2,90% cached 0.221 45%
h2,90% not modified 0.243 49%
h2,push 0.215 44%
Relative time for Chrome fetching a collection and 25 entities

Compound requests are by far the fastest. This indicates that my original guess was wrong. Even when caching comes into play, it still can’t beat just re-sending the entire collection again in a single compounded response.

Caching does marginally improve on not caching.

Test 2: 500 requests

So let’s do the big test. In this test we expect the differences to increase in some areas due to more requests simply taking longer, and we’re expecting some differences to decrease. In particular, the effect of the ‘initial’ request should be deemphasized.

Relative time for browsers fetching a collection with 500 entities
500 entities, but we removed the slowest

These graphs suggest that:

  • Chrome is the slowest browser for the tests that have the most requests.
  • Firefox is the slowest for the tests that use Push, and the test that’s mostly served from the browser cache.

This kinda matched my own observations. Push on firefox seemed a little unreliable, and using the cache seemed slow.

Test Chrome % Firefox %
h1,no cache 100.0% 84.51%
h1,compound 5.57% 5.61%
h1,90% cached 14.60% 11.30%
h2,no cache 18.20% 10.55%
h2,compound 5.91% 5.73%
h2,90% cached 7.86% 8.52%
h2,90% not modified 12.78% 11.22%
h2,push 9.02% 10.68%
Relative time in Chrome and Firefox for requesting 500 entities.

What we can tell here is at 500 requests, doing compound requests is around 1.8x faster on Firefox, and 3.26x faster on chrome.

The biggest surprise is the speed of browser caches. Our ‘normal’ test will do 501 HTTP requests. The tests that warm the cache only do 51 requests.

These results show that doing 501 requests takes around 2.3x as long as doing 51 requests with Chrome. In Firefox this only 1.2x.

In other words, the total time needed for Firefox to request something from its cache is only marginally faster from getting that resource from the other side of the continent. I was very surprised by this.

This made me wonder if Firefox’s cache is just slow in general, or especially bad at high concurrent access. I have no evidence for this, but it felt that maybe Firefox’s cache has some bad global locking behavior.

Another thing that stands out here is that Chrome appears to perform especially bad when firing off all 500 requests in parallel. More than twice as slow as Firefox. The massive difference made me doubt my results, but I re-ran the tests later and I get similar outcomes every time.

We also see that the benefit of using Push becomes less pronounced, as we only really save time by reducing the latency of the first request.

Conclusions

My tests are imperfect. The tests as much test HTTP/2 in general as they test the HTTP/2 implementation in Node. To get real proof, I think it’s important to test with more situations.

My server-implementation might also not have been the best one. My service served files from the filesystem, but a system under real load might behave differently.

So treat these results as evidence, but not proof.

The evidence tells me that if speed is the most important requirement, you should continue to use compound responses.

I do believe though that the results are close enough that it might be worth taking a performance hit and gain a potentially simpler system design.

It also appears that caching does not really make a significant difference. Potentially due to poor browser optimization, often doing a fresh HTTP request can take just as long as serving it from cache. This is especially true with Firefox.

I also believe that the effect of Push was visible but not massive. The biggest benefits of Push are on the first load from a new collection, and will also become more important to avoid the N+1 Query problem. Pushing responses earlier is mostly useful if:

  • The server can really benefit from generating the responses all at once.
  • If there are multiple hops needed in the API to get all its data, an intelligent push mechanism can be very effective reducing the compound latency.

Short summary:

  • If speed is the overriding requirement, keep using compound documents.
  • If a simpler, elegant API is the most important, having smaller-scoped, many endpoints is definitely viable.
  • Caching only makes a bit of difference.
  • Optimizations benefit the server more than the client.

However, I still doubt some of these results. If I had more time, I would try to test this with a server written in Go, and more realistically simulate conditions of the server-side.

My wishlist for 2020

I’m ending this post with a wish list for 2020 and beyond:

It’s an ambitious wish list. When we do arrive at this point, I believe it will be a major step forward towards making our REST clients simpler gain, our server implementations making fewer trade-offs for performance vs. simplicity and treating our browsers and servers as a reliable, fast engine for synchronizing URI-indexed resource state.

Resources:

Web mentions

Reposts:

matt trask HTTP Toolkit Pete Johanson Mathieu Kooiman Ju Sin

Likes:

krazyk Loïck P Toby Henderson Asbjørn 🐻 Ulsberg Marc Palau Ben Emdon Denis Denisov Nobuhide Yayoshi Some(@cλβαι) Darrel Miller Roxy Kesh Simon 🇨🇦 🇿🇦 Pete Johanson Hunter Skrasek andy piper Colin Coller markus staab