Nicholas Piël

  • Home
  • About
  • Projects

Benchmark of Python WSGI Servers

Nicholas Piël | March 15, 2010

It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks, which was being benchmarked by doing a regular HTTP request against the TCP server. The server itself was dumb and did not actually understand the headers being send to it. In this benchmark I will be looking at how different WSGI servers perform at exactly that task; the handling of a full HTTP request.

I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an unbiased comparison. However, a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But, if we keep that in mind we can actually put some measurements behind all those ‘faster than’ or ‘lighter than’ claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark. The specific domain of this benchmark is, yet again, the PingPong benchmark as used earlier in my Async Socket Benchmark. However, there are some differences:

  • We will fire multiple requests over a single connection, when possible, by using a HTTP 1.1 keepalive connection
  • It is a distributed benchmark with multiple clients
  • We will use an identical WSGI application for all servers instead of specially crafted code to return the reply
  • We expect the server to understand our HTTP request and reply with the correct error codes

This benchmark is a conceptually simple one and you could claim that this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent as this is mostly the case. However, the push towards HTML5’s websockets and highly interactive web applications will require servers that are capable to serve lots of concurrent connections with low latency.

The benchmark

We will run the following WSGI application ‘pong.py’ on all servers.

def application(environ, start_response):
    status = '200 OK'
    output = 'Pong!'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

We will also tune both client and server by running the following commands. This basically enables the server to open LOTS of concurrent connections.

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

The server is a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the server scales over multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn’t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple server processes (this can even be managed for you by special applications such as Spawning or Grainbows). The server and clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled, when this was not directly possible from the callable I modified the library. The server and the clients have 1GB of ram.

I benchmarked the HTTP/1.0 request rate of all server and the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5 websockets.

I categorize a server as HTTP/1.1 capable by its behaviour, not by its specs. For example the Paster server says that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. This reported bug might be relevant to this situation and might apply to some of the other “HTTP 1.0 Servers”.

The benchmark will be performed by running a recompiled httperf (which bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines. To initialize the different request rates and aggregate the results I will use a tool called autobench. Note: this is not ApacheBench (ab).

The command to benchmark HTTP/1.0 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=1

And the command for HTTP/1.1 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=10

The Contestants

Python is really rich with WSGI servers, i have made a selection of different servers which are listed below.

NameVersionhttp 1.1FlavourRepo.BlogCommunity
Gunicorn0.6.4Noprocessor/threadGIT?#gunicorn
uWSGITrunk (253)Yesprocessor/threadrepo?Mailing List
FAPWS30.3.1Noprocessor/threadGITWilliam Os4yGoogle Groups
Aspen0.8Noprocessor/threadSVNChad WhitacreGoogle Groups
Mod_WSGI3.1Yesprocessor/threadSVNGraham DumpletonGoogle Groups
wsgirefPy 2.6.4Noprocessor/threadSVNNoneMailing List
CherryPy3.1.2Yesprocessor/threadSVNPlanet CherryPyPlanet, IRC
Magnum Py0.2Noprocessor/threadSVNMatt GattisGoogle Groups
Twisted 10.0.0Yesprocessor/threadSVNPlanet Twisted Community
Cogen 0.2.1Yescallback/generatorSVN Maries Ionel Google Groups
GEvent 0.12.2Yeslightweight threadsMercurialDenis BilenkoGoogle Groups
Tornado0.2Yescallback/generatorGITFacebookGoogle Groups
Eventlet0.9.6Yeslightweight threadsMercurialEventletMailinglist
ConcurrencetipYeslightweight threadsGITNoneGoogle Groups

Most of the information in this table should be rather straightforward, I specify the version benchmarked and whether or not the server has been found capable of HTTP 1.1. The flavour of the server specifies the concurrency model the server uses and I identify 3 different flavours:

Processor / Thread model

The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence other requests. This is convenient as you do not need to worry about how everything is implemented, but it does come at a price. The maximum amount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when you have the need for lots of concurrent users.

Callback / Generator model

The callback/generator model handles multiple concurrent connections in a single thread, thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented. The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think that this is a beautiful way to create a form of event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.

Lightweight Threads

The lightweight flavour uses greenlets to provide concurrency. This also works by providing concurrency from a single thread but in a less obtrusive way then with the callbacks or generator approach. But of course one has to be careful with blocking connections as this will stop the event loop. To prevent this from happening, Eventlet and Gevent can monkeypatch the socket module to stop it from blocking so when you are using a pure python database connector this should never block the loop. Concurrence provides an asynchronous database adapter.

Implementation specifics for each WSGI server

Aspen

Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of ‘rockstar award’ i would definitely nominate Chad Whitacre. Its not only the great tools he created; Testosterone, Aspen, Stephane. But mostly how he promotes them with the most awesome screencasts i have ever seen.

Anyway, Aspen is a neat little Web server which is also able to serve WSGI applications. It can be easily installed with ‘pip install aspen’ and uses a special directory structure for configuration and if you want more information i am going to point you to his screencasts.

CherryPy

CherryPy is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple ‘pip install cherrypy’. I ran the following script to test out the performance of the WSGI server:

from cherrypy import wsgiserver
from pong import application

# Here we set our application to the script_name '/'
wsgi_apps = [('/', application)]

server = wsgiserver.CherryPyWSGIServer(('0.0.0.0', 8070), wsgi_apps, request_queue_size=500,     server_name='localhost')

if __name__ == '__main__':
    try:
        server.start()
    except KeyboardInterrupt:
        server.stop()

Cogen

The code to have Cogen run a WSGI application is as follows:

from cogen.web import wsgi
from cogen.common import *
from pong import application

m = Scheduler(default_priority=priority.LAST, default_timeout=15)
server = wsgi.WSGIServer(
            ('0.0.0.0', 8070),
            application,
            m,
            server_name='pongserver')
m.add(server.serve)
try:
    m.run()
except (KeyboardInterrupt, SystemExit):
    pass

Concurrence

Concurrence is an asynchronous framework under development by Hyves (you might call it the Dutch Facebook) built upon Libevent (I used the latest stable version 1.4.13), I fired up the pong application as follows:

from concurrence import dispatch
from concurrence.http import WSGIServer
from pong import application
server = WSGIServer(application)
# Concurrence has a default backlog of 512
dispatch(server.serve(('0.0.0.0', 8080)))

Eventlet

Eventlet is a full featured asynchronous framework which also provides WSGI server functionality. It is in development by Linden Labs (makers of Second Life). To run the application I used the following code:

import eventlet
from eventlet import wsgi
from pong import application
wsgi.server(eventlet.listen(('', 8090), backlog=500), application, max_size=8000)

FAPWS3

FAPWS3 is a WSGI server build around the LibEV library (I used version 3.43-1.1). When LibEV has been installed, FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:

import fapws._evwsgi as evwsgi
from fapws import base
from pong import application

def start():
    evwsgi.start("0.0.0.0", 8080)
    evwsgi.set_base_module(base)

    evwsgi.wsgi_cb(("/", application))

    evwsgi.set_debug(0)
    evwsgi.run()

if __name__=="__main__":
    start()

Gevent

Gevent is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libevent and uses its HTTP server functionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.

from gevent import wsgi
from pong import application
wsgi.WSGIServer(('', 8088), application, spawn=None).serve_forever()

The above code will run the pong application without spawning a Greenlet on every request. If you leave out the argument ’spawn=None’ Gevent will spawn a Greenlet for every new request.

Gunicorn

Gunicorn stands for ‘Green Unicorn’, everybody knows that a unicorn is a mix of the the awesome narwhal and the magnificent pony the green does however have nothing to do with the great greenlets as it really has a threaded flavour. Installation is easy and can be done with a simple ‘pip install gunicorn’ Gunicorn provides you with a simple command to run wsgi applications, all I had to do was:

gunicorn -b :8000 -w 1 pong:application

Update: I had some suggestions in the comment section that using a single worker and having a client connect  to the naked server is not the correct way to work with Gunicorn. So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers, 2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.

The run Gunicorn with more workers can be done such as:

gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application

MagnumPy

MagnumPy has to be the server with the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically pushing the ‘magnum’ directory on your PYTHONPATH edit ‘./magnum/config.py’ after which you can start the server by running ‘./magnum/serve.py start’

#config.py
import magnum
import magnum.http
import magnum.http.wsgi
from pong import application

WORKER_PROCESSES = 1
WORKER_THREADS_PER_PROCESS = 1000
HOST = ('', 8050)
HANDLER_CLASS = magnum.http.wsgi.WSGIWrapper(application)
DEBUG = False
PID_FILE = '/tmp/magnum.pid'

Mod_WSGI

Mod_WSGI is the successor of Mod_Python, it allows you to easily integrate Python code with the Apache server. My first python web app experience was with mod_python and PSP templates, WSGI and cool frameworks such as Pylons have really made life a lot easier.

Mod_WSGI is a great way to get your application deployed quickly. Installing ‘Mod_WSGI’ is with most Linux distributions really easy. For example:

aptitude install libapache2-mod-wsgi

Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) server with Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to ‘/etc/apache2/httpd.conf’:

WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py

The problem is, that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe, meaning that you are forced to use a pre-forking Apache server. In this benchmark I am using the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).

I disabled all unnecessary modules and configured Apache to provide me with a single worker, lots of threads and disabled logging (note: i tried various settings):

<IfModule mpm_worker_module>
    ServerLimit         1
    ThreadLimit         1000
    StartServers          1
    MaxClients          1000
    MinSpareThreads     25
    MaxSpareThreads     75
    ThreadsPerChild     1000
    MaxRequestsPerChild   0
</IfModule>
CustomLog /dev/null combined
ErrorLog /dev/null

Paster

The Paster webserver is the webserver provided with Python Paste it is Pylons default webserver. You can run a WSGI application as follows:

from pong import application
from paste import httpserver
httpserver.serve(application, '0.0.0.0', request_queue_size=500)

Tornado

Tornado is the non-blocking webserver that powers FriendFeed. It provides some WSGI server functionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.

import os
import tornado.httpserver
import tornado.ioloop
import tornado.wsgi
import sys
from pong import application
sys.path.append('/home/nicholas/benchmark/wsgibench/')
def main():
    container = tornado.wsgi.WSGIContainer(application)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(8000)
    tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
    main()

Twisted

After installing Twisted with PIP you get a tool ‘twistd’ which allows you to easily serve WSGI applications fe:

wistd –pidfile=/tmp/twisted.pid -no web –wsgi=pong.application –logfile=/dev/null

But you can also run a WSGI application as follows:

from twisted.web.server import Site
from twisted.web.wsgi import WSGIResource
from twisted.internet import reactor
from pong import application

resource = WSGIResource(reactor, reactor.getThreadPool(), application)
reactor.listenTCP(8000,Site(resource))
reactor.run()

uWSGI

uWSGI is a server written in C, it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache, NGINX, Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:

worker_processes  1;

events {
    worker_connections  30000;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    keepalive_timeout  65;

    upstream pingpong {
        ip_hash;
        server unix:/var/nginx/uwsgi.sock;
    }

    server {
        listen       9090;
        server_name  localhost;

        location / {
            uwsgi_pass  pingpong;
            include     uwsgi_params;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

}

This made NGINX listen on a unix socket, now all i needed to do was have uWSGI connect to that same unix socket, which i did with the following command:

./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong -z 30 -l 500 -L

WsgiRef

WsgiRef is the default WSGI server included with Python since version 2.5. To have this server run my application I use the following code which disables logging and increases the backlog.

from pong import application
from wsgiref import simple_server

class PimpedWSGIServer(simple_server.WSGIServer):
    # To increase the backlog
    request_queue_size = 500

class PimpedHandler(simple_server.WSGIRequestHandler):
    # to disable logging
    def log_message(self, *args):
        pass

httpd = PimpedWSGIServer(('',8000), PimpedHandler)
httpd.set_app(application)
httpd.serve_forever()

Results

Below you will find the results as plotted with Highcharts, the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.

HTTP 1.0 Server results

01000200030004000
01000200030004000Reply rate

Reply Rate

on an increasing amount of requests (more is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w
Highcharts.com

Disqualified servers

From the above graph it should be clear that some of the web servers are missing, the reason is that I was unable to have them completely benchmarked as they stopped replying when the request rate passed a certain critical value. The servers that are missing are:

  • MagnumPy, i was able to obtain a reply rate of 500 RPS, but when the request rate passed the 700 RPS mark, MagnumPy crashed
  • Concurrence, I was able to obtain a successful reply rate of 700 RPS, but it stopped replying when we fired more than 800 requests a second at the server. However, since Concurrence does support HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark
  • Cogen, was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.
  • WSGIRef, I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark
  • Paster, obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark

Interpretation

From the servers that passed the benchmark we can see that they all have an admirable performance. At the bottom we have Twisted and Gunicorn, the performance of Twisted is somewhat expected as well it isn’t really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing, also because for example Aspen which is a pure Python from a few years back, shows a significant better performance.  We can see however, that  increasing the worker count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.

The other pure python servers, CherryPy  and Tornado seem to be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So, if you are thinking to change from ModWSGI or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn’t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking your interpreter.

The top performers are clearly FAPWS3, uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations, this has been noted by others as well as it looks like it is being used in production at Ebay. uWSGI is used successfully in production at (and in development by) the Italian ISP Unbit. Gevent is a relatively young project but already very successful. Not only did it perform great in the previous async server benchmark but its reliance on the Libevent HTTP server gives it a performance beyond the other asynchronous frameworks.

I should note that the difference between these top 3 is too small to declare a clear winner of the ‘reply rate contest’. However, I want to stress that with almost all servers I had to be careful to keep the amount of concurrent connections low since threaded servers aren’t that fond of lots concurrent connections. The async servers (Gevent, Eventlet, and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability as you do not have to worry about settings such as poolsize, worker count etc..

01000200030004000
01000200030004000Response Time (ms)

Response Time

on an increasing amount of requests (less is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w
Highcharts.com

Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows, unfortunately, a dramatic increase in latency when the request rate passes the 1000 RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.

01000200030004000
050100150200Error Rate

Error Rate

on an increasing amount of requests (less is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w
Highcharts.com

The low error rates for CherryPy, ModWSGI, Tornado, uWSGI should give everybody confidence in their suitability for a production environment.

HTTP 1.1 Server results

In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low, for example a request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the HTTP/1.1 benchmark (Cogen and Concurrence for example).

02000400060008000
0200040006000800010000Request rate

Succesful Reply Rate

on an increasing amount of requests (more is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence
Highcharts.com

This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.

The graph shows a huge gap in performance difference, with the fastest server Gevent we are able to obtain about 9000 replies per second, with Twisted, Concurrence and Cogen we get about 1000. In the middle we have CherryPy and ModWSGI with them we are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark. This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However, it falls to be seen how much of a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy, for example NGINX. In such a setting the connection type between the upstream and the proxy is usually limited to HTTP/1.0, NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.

The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the ’spawn=none’ option to prevent Gevent from spawning a Greenlet, this seems fair in a benchmark like this. However, when you want to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name ‘Gevent-Spawn’, from its results we can see that performance penalty is small.

02000400060008000
050010001500200025003000Response Time (ms)

Response Time

on an increasing amount of requests (less is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence
Highcharts.com

Cogen is getting a high latency after about 2000 requests per second, Eventlet and Twisted show an increased latency fairly early as well.

02000400060008000
0510152025Error Rate

Error Rate

on an increasing amount of requests (less is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence
Highcharts.com

The error rate shows that Twisted, Concurrence and Cogen have some trouble keeping up, I think all other error rates are acceptable.

Memory Usage

I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit from additional processes (as there is only one available processor) I limited the amount of workers when possible.

36231223024771744276993233157
AspenCherryPyCogenConcurrenceEventletFAPWS3geventGunicornGunicorn-3wMagnumPyModWSGIPasterTornadoTwisteduWSGIWsgiRef
0255075100125150Memory Usage (Megabytes)

Accumulated Peak Memory Usage per WSGI server

Highcharts.com

From these results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI, Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen is leaking memory, but I haven’t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch from the naked deployment to the deployment after NGINX as we now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.

Let’s Kick it up a notch

The first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI server was working hard enough it could simply answer all requests from a certain user and move on to the next user. This keeps the amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.

However, if we are going to increase the amount of concurrent connections we will quickly run into system limits as explained in the introduction. This is commonly known as the C10K problem. Asynchronous servers use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.

So that is what we are going to do, we are going to take the top-3 performing WSGI servers namely Tornado, Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them 5 minutes of ping-pong mayhem.

You see, ping-pong is a simple game and it isn’t really the complexity that makes it interesting it is the speed and the reaction of the players. Now, what is 5 minutes of pingpong mayhem? Imagine that for 5 minutes long every second an Airbus loaded with ping-pong players lands (500 clients) and each of those players is going to slam you exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys of 2000 different players at once.

Tsung Benchmark Setup

To perform this benchmark I am going to use Tsung, which is a multi-protocol distributed load testing tool written in Erlang. I will then have 3 different machines simulating the ping-pong rampage. I used the following Tsung script.

<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>
<tsung loglevel="warning">

    <clients>
        <client host="tsung2" use_controller_vm="false" maxusers="800"/>
        <client host="tsung3" use_controller_vm="false" maxusers="800"/>
        <client host="bastet" use_controller_vm="false" maxusers="800"/>
    </clients>
    <servers>
        <server host="tsung1" port="8000" type="tcp"/>
    </servers>
    <monitoring>
        <monitor host="tsung1" type="erlang"/>
    </monitoring>

    <load>
        <arrivalphase phase="1" duration="5" unit="minute">
            <users interarrival="0.002" unit="second"/>
        </arrivalphase>
    </load>

    <sessions>
        <session name='wsgitest' probability='100'  type='ts_http'>
            <for from="0" to="12" incr="1" var="counter">
                <request>
                    <http url='http://tsung1:8000/' version='1.1' method='GET'/>
                </request>
                <thinktime random='false' value='5'/>
            </for>
        </session>
    </sessions>

</tsung>

Tsung Benchmark Results

0100200300400500600
05000100001500020000Concurrent Connections

Concurrent Connections

measured over time (in seconds)

  • tornado
  • uwsgi
  • gevent
Highcharts.com

0250500
00.511.5

System Load

Highcharts.com
0250500
0255075100125CPU Usage

CPU Usage

Highcharts.com
0250500
7008009001000Memory Free

Free memory

Highcharts.com

Let me first state that all the three frameworks are perfectly capable to handle this kind of load, none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement, considering that they had to handle about 2 million requests each.

Below the concurrent connection graph we can see the system load, the cpu usage and the free memory on the system during the benchmark. We can clearly see that Gevent put less strain on the system as the CPU and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.

The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against Libevent 2.0.4a, the new release of Libevent has been said to show considerable performance improvements in its HTTP server. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.

0100200300400500600
0100200300400500Response Time (ms)

Server Latency

measured over time (in seconds)

  • tornado
  • uwsgi
  • gevent
Highcharts.com

The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado’s response time hovers around 100ms, uWSGI around 5ms and gevent around 3ms. This is quite a difference and I am really amazed by the low latency of both Gevent and uWSGI during this onslaught.

Summary and Remarks

The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seem to perform better than others but by focussing only on server performance I will not justify most of the tested servers as they differ greatly in functionality. Also, if you are going to take some stock web framework and won’t do any optimizations or caching, the performance of your webserver is not going to matter as this will not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at is really your own application.

When you are just interested in quickly hosting your threaded application you really can’t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there is a lot to go for in terms of functionality. For example, protecting part of your website by using a LDAP server is as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative which can lower memory requirements.

When you are a little more adventurous you can look at uWSGI and FAPWS3, they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.

Concerning Tornado and performance, I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code. But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.

And then there is Gevent, it really showed amazing performance at a low memory footprint, it might need some adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been some reports of deploying Gevent successfully even with SQLAlchemy.

And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that, at least that is what we are going to use.

Tags
async, performance, programming, Python, wsgi
RSS comments feed

« Asynchronous Servers in Python ZeroMQ an introduction »

114 Responses to “Benchmark of Python WSGI Servers”

  1. Comparing gevent to eventlet « Concurrency in Python says:
    March 16, 2010 at 2:41 pm

    [...] WSGI server is based on the libevent’s built-in HTTP server, making it super fast. [...]

    Reply
    • brianm says:
      March 7, 2011 at 5:13 pm

      The use of a virtualization environment during your benchmarks will not produce accurate results.

      Reply
  2. Jean-Paul Calderone says:
    March 16, 2010 at 3:57 pm

    Hi Nicholas,

    I’m curious if you verified that the threadpools used in each server were of the same size (for those servers using threadpools. This could make a significant difference in the results. It might also be interesting to learn beyond what point increasing the threadpool size no longer helps performance. It’s also worth noting that several of the top performers, by not using threads, are not actually implementing a general purpose, scalable WSGI server. They take a valuable shortcut which aids performance, but this should be considered when selecting a server, since it could lead to disastrous performance for certain applications.

    I’m also curious if you did any analysis of the errors some of the servers encountered. From my investigations, I’ve commonly found that this is closely tied to request throughput rate; when a server begins to lag behind the request rate, if it does not continue to accept new connections, many TCP/IP stacks will begin to reject incoming TCP connections themselves. This is somewhat useful to know, but I think it’s worth separating from a failure that actually occurs within the software being tested, particularly since in this case it’s mostly redundant with the information about the number of requests/second each server can respond to.

    One last thing. :) I wonder if you have any information on the distribution of response times. The graphs of mean (I assume) times are interesting, but knowing what the raw data looks like is also important (and actually necessary in order to correctly interpret the rest of the data you’ve presented here).

    Great work so far. I hope you keep it up. I also hope that at some point there’s something downloadable that people can use to reproduce your results, as well as extend the analysis done on them.

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 5:31 pm

      I tried to maximize the performance of the various frameworks by optimizing the threadpool, but this is kinda painful to perform because when a pool gets too big it can crash the server. So i assume that some gains are possible, but i suspect that those gains will be relatively small though.

      Concerning the errors, you can see a difference in the kind of errors between the servers that are able to complete the benchmark and those who aren’t. But yes i could have specified wether it was a connection reset or a timeout, but the article was getting really long already.

      The mean values are indeed depicted in the graph, while I agree that the STD would be interesting to show as well the graphs are already very crowded and i find that the curve can give me some indication of the stability of the mean. For example compare uWSGI against FAPWS3 after the 6000 RPS mark with each other.

      Reply
      • Itamar Turner-Trauring says:
        March 19, 2011 at 4:01 pm

        I checked, and increasing thread pool size does have significant impact on Twisted (YMMV), so I suspect the same would be true for other thread pool-based frameworks.

        Reply
  3. yml says:
    March 16, 2010 at 4:03 pm

    Very interesting benchmark it confirms my personal experience with the WSGI webserver i have tested mod_wsgi cherrypy and uwsgi.

    I would be interesting to know which version for each web server you have used.

    Regards,
    –yml

    Reply
    • Richard Shea says:
      March 16, 2010 at 9:57 pm

      The versions are given under the heading ‘Contestants’ towards the top of the article.

      Reply
  4. Aigars Mahinovs says:
    March 16, 2010 at 4:30 pm

    Please use more distinct colors for your graphs and also make the legend lines thicker, 10 px thick at least – it is impossible to understand what line belongs to what server.

    Reply
    • James says:
      March 16, 2010 at 5:15 pm

      Hover over the server names in the legend of each chart – the lines in the chart become more prominent.

      Reply
  5. Paul J. Davis says:
    March 16, 2010 at 4:32 pm

    Nicholas,

    Just a note on the gunicorn numbers. Running the server with a single worker process is constraining any ability for concurrency in its responses. Its meant to run with 2-4x the number of processors you have on the machine.

    There’s also a slight gotchya in the motivation for implementing HTTP/1.1. Nginx’s proxy is only HTTP/1.0, so if you need to use it to scale out multiple python server processes there’s no benefit from HTTP/1.1 which is why gunicorn didn’t bother to implement it. (As its designed to be proxied by nginx).

    If I get some time later I’ll try and rerun some of these numbers on bigger hardware. I’ve personally seen gunicorn run that HTTP/1.0 benchmark 10K req/s faster than gevent does.

    Thanks for the writeup,
    Paul Davis

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 5:07 pm

      I understand these gotcha and i mentioned it in the article. If i could be of any help please let me know.

      I could rerun the bench somewhere in the future with more assigned processors and workers to specifically test out that case, if you want.

      Reply
      • Paul J. Davis says:
        March 16, 2010 at 7:55 pm

        Nicholas,

        Remember that gunicorn isn’t like the rest of web servers in terms of its process utilization. Even when its only got a single core allocated for use it will still benefit from an increase in the number of workers allocated. Configuring gunicorn with a single worker is like configuring all the threaded servers to use a single thread, its just not how it was intended to be run.

        Also, in your httperf invocation, did you keep the number of connections a constant for every test? Ie, were the 4K r/s tests taking 1/10th of a second? That might explain some of the noise in the graphs.

        HTH,
        Paul

        Reply
        • Nicholas Piël says:
          March 16, 2010 at 9:55 pm

          Ok,

          I am running Gunicorn right now with 3 workers, thus i have a master processes and 3 workers (and ofcourse the NGINX processes).

          gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application
          

          I’ll add it to the benchmark when its done, the main reason why i did not try multiple workers was because this would have a negative influence on the memory statistics and I did not expect any performance increase. From the initial results i’m getting back from the current benchmark it does seem to improve the results for Gunicorn moving it to a more respectable position.

          Cheers,
          Nicholas

          Reply
          • Paul J. Davis says:
            March 16, 2010 at 10:46 pm

            Nicholas,

            Excellent!

            Feel free to report the sum of the process memory. There are some oddities with copy-on-write semantics but I’ve never heard of a good way to tease those apart for proper usage reports.

            Thanks,
            Paul

  6. Steve Losh says:
    March 16, 2010 at 4:36 pm

    You mentioned putting uWSGI behind nginx but didn’t say anything about doing the same for gunicorn. Does that mean you ran the benchmarks without nginx proxying for gunicorn?

    Gunicorn isn’t designed to be used like that — it’s supposed to live behind nginx (or something similar) just like uWSGI.

    http://gunicorn.org/deployment.html describes the way you’re supposed to deploy gunicorn.

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 5:03 pm

      Thats correct, however, i did try to put it behind NGINX (via a Unix socket) and that did not give me any performance increase.

      I am also having a difficult time how that would improve performance as I only use a single worker.

      Reply
      • Paul J. Davis says:
        March 16, 2010 at 8:12 pm

        In this specific case it doesn’t matter whether you run gunicorn behind nginx as the wsgi app and the clients are both super duper fast. Gunicorn depends on having a buffering proxy to deal with client load as described at [1]. Slowloris is obviously an extreme example of slow client behavior but a public facing server will obviously be exposed to the entire spectrum of client speed between super fast and super slow.

        HTH,
        Paul

        [1] http://ha.ckers.org/slowloris/

        Reply
  7. pau freixes says:
    March 16, 2010 at 4:43 pm

    Twisted is a processor/thread flavor ?

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 5:04 pm

      Twisted is Async, however its WSGI server uses a threadpool.

      Reply
      • Anton says:
        March 28, 2010 at 1:11 pm

        Wow, I didnt know.
        Too bad for Twisted :(

        Reply
  8. Idan Gazit says:
    March 16, 2010 at 5:10 pm

    Yeah, seconding a request to make the charts legible.

    A 1-pixel-thick legend makes it impossible to pick out which color belongs to which server, making all your hard work practically useless as I’m unable to read the chart.

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 5:28 pm

      Did you try to hover with your mouse over the graph?

      Reply
  9. Peter Shinners says:
    March 16, 2010 at 5:55 pm

    Fantastic information, thanks.

    Reply
  10. Daniel Hahler says:
    March 16, 2010 at 6:18 pm

    I noticed that you’ve used “from hello import application” (instead of “from pong”) for FAPWS3 and Paster.
    Is it the same application though?

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 6:24 pm

      Sharp eye, thanks! I modified the code examples.
      It is indeed the same application.

      Reply
  11. Name says:
    March 16, 2010 at 8:40 pm

    Can you add Rocket to the tests?

    https://launchpad.net/rocket

    Reply
  12. Richard Shea says:
    March 16, 2010 at 9:52 pm

    Great article. Enjoyed reading it. Thanks for all your work writing it.

    Reply
  13. Passy says:
    March 16, 2010 at 10:23 pm

    Really, really impressive post. Thanks for sharing your research. It’s been about time for a comprehensive comparison like that.

    Reply
  14. Bram Cohen says:
    March 16, 2010 at 10:38 pm

    It looks like in your last set of tests tornado was just barely able to handle the load from a CPU standpoint, which might account for its high delays. If you run a slightly less difficult test, or on a faster machine, so that the CPU load of tornado is more like 70% than 90%, does the server latency drop to being similar to the others, or is that endemic?

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 11:52 pm

      Yes, very likely. I’ll keep that in mind if i plan to do an update, could be interesting to investigate.

      Reply
  15. Graham Dumpleton says:
    March 16, 2010 at 11:15 pm

    Using ‘ThreadsPerChild 16000′ for Apache/mod_wsgi is just plain stupid. It is directly because of that that it had such a large memory footprint. If you drop that value down to below 100 you will probably find the same performance and yet a lot less memory being used. If your test program is merely a hello world application that returns a few bytes, you could possibly get away with a lot less threads than 100. Some high performance sites with a lot of requests, but where Apache and the application has been tuned properly, get away with 2 or 3 threads in single daemon mode process.

    When using Apache/mod_wsgi, forcing use of a single process is also going to make performance suffer due to limitations of the GIL. The strength of Apache/mod_wsgi is that you can fire up multiple processes and avoid these GIL issues, especially where using a multi processor/core system.

    I suggest you go back and redo your Apache/mod_wsgi tests starting with the Apache default of 25 threads in one process for embedded. If you see it start to suffer under high number of concurrent requests, then add more processes as well and not just threads, with more processes and dropping threads actually better.

    Reply
    • Nicholas Piël says:
      March 16, 2010 at 11:42 pm

      Thanks, for your remarks Graham.

      I obviously did not have the amount of threads set to that insane amount of 16k as this will invoke the OOM killer on my machine. The 16k setting is a left over from when i tried to have Apache competing in the Tsung benchmark, that didn’t work.

      As noted i experimented with some of the settings to obtain an optimal balance of not increasing the error rate (in the HTTP 1.1 benchmark which can force a lot of concurrent connections). For the benchmark i used a thread setting of 1000, lowering this number would raise the error rate. With this setting the memory usage starts at a relatively low of 21Mb but as the benchmark progresses it reaches 64Mb.

      I could try splitting the amount of threads over multiple processes because indeed the issues you mention could indeed hold back ModWSGI its performance. But I suppose that this would increase the memory usage, at least this was the main reason why I decided to limit it to one process.

      I will see, when and if I re-benchmark it. Btw, your other comment just popped in. I did not find out how to disable logging on Apache, can you give me a pointer?

      Reply
      • isaac says:
        March 17, 2010 at 12:20 am

        Comment out all the logging directives; that will turn logging off in Apache.

        Reply
        • Graham Dumpleton says:
          March 17, 2010 at 12:33 am

          Yep, isaac has it right from memory, just comment out the CustomLog directives completely.

          Reply
  16. Graham Dumpleton says:
    March 16, 2010 at 11:31 pm

    Oh, and turn off Apache access logging, don’t just send it to /dev/null. Turning it off completely is better than sending it to /dev/null as don’t then have to do the actual processing and writing of the messages.

    Reply
  17. RJ Ryan says:
    March 17, 2010 at 12:11 am

    Thank you so much for putting the time into writing this. It was very interesting and informative to read.

    Reply
  18. Ian Bicking: a blog :: The Web Server Benchmarking We Need says:
    March 17, 2010 at 12:23 am

    [...] WSGI web server benchmark was published. It’s a decent benchmark, despite some criticisms. But it benchmarks what [...]

    Reply
  19. Kyle says:
    March 17, 2010 at 12:45 am

    Unless Twisted has changed recently, you need to specifically import the epoll reactor, otherwise you get the select reactor which is significantly slower.

    http://twistedmatrix.com/documents/current/core/howto/choosing-reactor.html#epoll

    Reply
    • Alex says:
      March 17, 2010 at 1:09 am

      This is still true as of the latest twisted.

      Reply
  20. Sylvain Hellegouarch says:
    March 17, 2010 at 9:08 am

    I find interesting folks blaming Nicholas for not using the right configuration whilst not pointing to where each server documents those performance settings. People should get a grip.

    Reply
  21. Josh says:
    March 17, 2010 at 3:28 pm

    This is a completely off-topic question, but what software do you use to create the charts? Is it open source?

    Reply
    • Nicholas Piël says:
      March 17, 2010 at 5:05 pm

      As explained in the article, the data gets collected by autobench (which commands httperf) and then gets converted to Highcharts javascript code by a simple Python script.

      Autobench and Highcharts are both open source.

      Reply
      • Gabriel Gunderson says:
        July 9, 2010 at 7:39 am

        Quick correction… Highcharts is under the “Creative Commons Attribution-NonCommercial 3.0 License”. As such, it’s not under one of the ‘approved’ licenses by the OSI. In fact, it fails the first point of The Open Source Definition:

        “”"
        1. Free Redistribution
        The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
        “”"

        However, it’s a pretty sharp charting tool and worth every penny (if using in a commercial context). It’s just not Open Source.

        Anyway, I very much enjoyed your post. Thanks for sharing your findings with the rest of us! You’ve saved me (and I’m sure many others) hours of testing :)

        Kind regards,
        Gabe

        Reply
  22. Comparativa Python WSGI Web Server | Giovanni Raco says:
    March 17, 2010 at 9:40 pm

    [...] Piël ha realizzato un’interessante comparativa utilizzando una piccola funzione WSGI su una macchina Debian Lenny/AMD64 con Python 2.6.4. Per ogni [...]

    Reply
  23. Sébastien Estienne says:
    March 18, 2010 at 7:54 am

    Hi Nicholas,

    Really interesting benchmark, but i think it could be really interesting to turn it in a challenge and make it more ‘real world usage’ by only providing apps:
    - a simple ping/pong app
    - a simple django app
    - a django app + template
    - django app + sqlite reads and /or memcache read/write
    and ask each server’s community to tune and provide the optium setup for its specific server to achieve best performance.

    Also using hardware that is more uptodate, because i don’t know any serious job getting done on a single core anymore :)
    I think that quad core and 2/4Go or ram is a better view of the server market.

    this challenge could be run on EC2 instances for example.

    And it should be noted that even the worse server in your benchmark with (1000 r/s on a single core) would provide enought performance for 99% of current website needs.
    So feature and ease of use should be taken into account to choose a wsgi server.

    Great article, can’t wait to read the next one!

    Reply
  24. DavidG says:
    March 18, 2010 at 12:26 pm

    First of all: nice benchmarks!

    I must admit however that for choosing a Python WSGI server, I’d base my choice for at least 50% on benchmarks with basic POST data handling to see what it’s capable of… Maybe a follow up?

    Reply
  25. FearAndLoath.Us » Benchmark of WSGI Servers says:
    March 18, 2010 at 3:19 pm

    [...] http://nichol.as/benchmark-of-python-web-servers [...]

    Reply
  26. Bookmarks for March 18th from 07:42 to 09:12 | The Wahoffs.com says:
    March 18, 2010 at 4:20 pm

    [...] Nicholas Piël » Benchmark of Python Web Servers – An extremely techie article – comparing the performance characteristics of different wsgi server implementations for hosting python sites. [...]

    Reply
  27. MyEyes! says:
    March 20, 2010 at 8:17 pm

    Is it just my ageing CRT-irradiated eyes, or are the colours in those graphs almost impossible to distinguish? I can’t match the legend to the lines in the graph without opening it in GIMP!

    Reply
  28. MyEyes! says:
    March 20, 2010 at 8:20 pm

    Ignore me, I just found the interactive mouse-overs! Nice ;-)
    Thanks for the benchmarks!

    Reply
  29. Python Web Server a confronto | Edit - Il blog di HTML.it says:
    March 23, 2010 at 7:46 am

    [...] Nicholas Piel ha scritto un paio di righe di codice allo scopo di testare e fare un rapporto soddisfacente sui vari wsgi server disponibili per Python. Il risultato testimonia come questi server possano offrire ottime soluzioni e buona efficienza. Trovate i dettagli in questo articolo. Io sottolineerei una frase dell’autore: la velocità dipende molto da come scrivete il codice. Una grande verità che spesso gli sviluppatori tendono a dimenticare puntando il focus solo sui framework e sui server. [...]

    Reply
  30. Peters Linkschleuder – Der Schockwellenreiter says:
    March 23, 2010 at 9:23 am

    [...] Nicht nur für Hardcore Pythonistas, sondern auch für Webmaster, die das Optimum aus ihren Kisten herausholen wollen: Benchmark of Python Web Servers. [...]

    Reply
  31. Workers of the world (wide web), unite! « NIL: .to write(1) ~ help:about says:
    March 28, 2010 at 10:40 am

    [...] Filed under: Uncategorized | A few days ago I ran into an interesting post by Ian Bicking about a benchmark that Nicholas Piël ran on WSGI servers. Go ahead and read the original posts, the the skinny is [...]

    Reply
  32. serving up python with the quickness « lithostech.com says:
    April 5, 2010 at 9:13 pm

    [...] web applications with an eye to performance. Nicholas Piël has done some great work testing and documenting many of them. Gevent looks like a great option as does CherryPy, but uWSGI caught my eye because it [...]

    Reply
  33. david fries says:
    April 5, 2010 at 11:21 pm

    Greate post, Nicholas! As luck would have it, I was just looking for a benchmark like this as part of a work project. You saved me a lot of work :)

    I am curious, though. How did you measure the memory footprints? Using free, pmap, good old top or something home-brewed? Memory usage is what I’m most concerned about.

    Reply
  34. UWSGI FAN says:
    April 9, 2010 at 6:08 pm

    Excellent article. Thanks for putting this benchmark online. I’m looking strongly at uWSGI and Gevent now. I haven’t deployed a Django app for some time now (about 9 months) because it’s previously just been a hobby. I’ve used MOD_WSGI and FCGI with Flup, but now I need something for an enterprise deployment and I’m so effing excited to see that Python is getting the sort of cool tools that RoR has enjoyed. Ahhhhh, this is the first time I’ve been so excited of some effing code :)

    Reply
  35. Deployment Notes for Pylons, Nginx, and uWSGI | Tony Landis says:
    April 10, 2010 at 1:46 am

    [...] I have one particularly critical Pylons app currently deployed with paster, and after reading this Benchmark of Python WSGI servers I decided on uWSGI for this [...]

    Reply
  36. Labour Updated « NIL: .to write(1) ~ help:about says:
    April 15, 2010 at 11:14 pm

    [...] web | I’ve had more time to work on Labour (originally posted here, inspired by this and that), the WSGI Server Durability Benchmark. I’m relatively happy with progress so far, and will [...]

    Reply
  37. Python WSGI服务器大乱斗 | In the Milky way says:
    April 23, 2010 at 3:09 pm

    [...] app服务器决定着整个系统的响应快慢。通过参考Nicholas Piel写的《Benchmark of Python WSGI Servers》我圈定了以下几个服务器(模块):mod_wsgi for [...]

    Reply
  38. gmong says:
    May 7, 2010 at 8:03 am

    Excellent article! very helpful.

    Reply
  39. Ludvig Ericson says:
    May 11, 2010 at 11:45 am

    Hi Nicholas,

    it seems you have failed to configure at least gunicorn properly. Gunicorn can actually use eventlet *or* gevent to serve its requests, see http://gunicorn.org/deployment.html

    Other than that, this only goes to show that libevent is very good at handling concurrency if you ask me. :-)

    Regards

    Reply
    • Nicholas Piël says:
      May 11, 2010 at 12:40 pm

      Hi Ludvig,

      At the time I wrote this article this was in a separate package (Grainbows, which is mentioned above). The functionality merger is quite recent.

      Cheers,
      Nicholas

      Reply
      • Peter Portante says:
        September 10, 2010 at 5:55 am

        Have you any interest in running the gunicorn config with gevent to see if it performs any differently. I am curious. Thanks, -peter

        Reply
  40. Django Deployment says:
    May 25, 2010 at 3:40 pm

    [...] И оттуда, в частности, очень подробный и хороший обзор большого кол-ва wsgi серверов: http://nichol.as/benchmark-of-python-web-servers [...]

    Reply
  41. todd@tsmith.org says:
    July 9, 2010 at 6:26 pm

    I’ve read countless blogs articles on benchmarks for web-servers. This is the best! Thanks for the great job. I learned a lot reading this blog. It has definitely changed some of my ideas about the best way to put together a server.

    Todd

    Reply
  42. Alex Sergeyev says:
    July 26, 2010 at 4:03 am

    Nicholas, you did awesome job! Thanks.
    Only question is why you decided on unix-socket for ucgi and gunicorn under nginx, I’ve read that localhost tcp/ip may actually outperform unix sockets.

    Reply
  43. Yang Zhang says:
    July 28, 2010 at 6:48 pm

    Hi, thanks for the great benchmark. Would be very interested to see results for gevent with greenlet spawning enabled, and perhaps even results for evio. Thanks!

    Reply
    • Yang Zhang says:
      July 28, 2010 at 6:49 pm

      Meant to write “coev” not “evio.”

      Reply
  44. Andrew Stromnov says:
    August 4, 2010 at 1:36 pm

    Yet another WSGI server for Python: http://pypi.python.org/pypi/meinheld.

    Reply
  45. Anh K. Huynh says:
    September 7, 2010 at 5:38 am

    Really really cool article. Thanks so much!!!

    Reply
  46. Useful Python Resource for Setting Up Django Website « Chris Chou says:
    September 9, 2010 at 3:48 pm

    [...] Servers: Benchmark of Python Web Servers Cache Mechanisms: Evaluating Django Caching [...]

    Reply
  47. Useful Python Resources for Setting Up Django Website « Chris Chou says:
    September 9, 2010 at 3:50 pm

    [...] Servers: Benchmark of Python Web Servers Cache Mechanisms: Evaluating Django Caching [...]

    Reply
  48. Going Green — David Bennett / __init__ says:
    September 16, 2010 at 10:51 pm

    [...] threads, and greenlets and such. I also came across the Green Unicorn project that, though not very speedy with its default worker class, has recently integrated gevent to make it a very attractive [...]

    Reply
  49. Autobench Cloud — David Bennett / __init__ says:
    September 24, 2010 at 11:36 pm

    [...] by david, on Sep 24, 2010 4:36:19 PM. After seeing Nicholas Piël benchmark a bunch of Python web servers, I was just itching to try some different configurations. So, I [...]

    Reply
  50. 以WSGI方式安装MoinMoin « My Brownian Motion says:
    September 25, 2010 at 8:08 am

    [...] for nginx等等,详情见《Python WSGI服务器大乱斗(Rev.2)》,《Benchmark of Python WSGI Servers》 和《WSGI [...]

    Reply
  51. Performance and Releases « Timefields says:
    October 14, 2010 at 1:00 pm

    [...] that hits throughput, still OOTB the stack is doing Ok. An extensive benchmarking experiment at http://nichol.as/benchmark-of-python-web-servers shows that Java and Jetty even in threaded mode is just as good as some fo the event mode [...]

    Reply
  52. openid.hive.pt/joamag says:
    October 22, 2010 at 12:00 pm

    Great article… thanks man

    Reply
  53. Michael Buckley says:
    October 24, 2010 at 6:46 am

    Impressive coverage of the status of all the servers. However once you put Django or something like that into the mix, most the stats don’t really mean much any more.

    Reply
  54. Seb says:
    November 2, 2010 at 12:11 pm

    Great article, well done!
    Good to see gevent among the best which was my first choice and I’m happy with it.

    Reply
  55. Сергей Бейлин says:
    November 6, 2010 at 8:36 pm

    Great! Thanks a lot.

    Reply
  56. FILLY says:
    November 10, 2010 at 2:17 am

    Do u use a special software to benchmark? (like Apache Benchmark Tool?)
    If Yes which one?

    Reply
  57. Mayur says:
    December 1, 2010 at 10:05 pm

    Very very useful test. Thanks for sharing it.

    Tornado looks great if all your communications are short bursts, but I’m looking at gevent because some of our responses can be quite large. As a result, we would be unable to use the “normal” gevent.wsgi server (which is really the libevent server if I understand correctly) because we can’t afford to buffer the messages in RAM for all those connections.

    I was wondering whether you have timing for gevent using the pywsgi server, which supports SSL and chunked response. I imagine that it would have very different characteristics.

    Thanks again.

    Reply
    • Dan Ellis says:
      March 7, 2011 at 8:08 pm

      Yes, I’d be very interested in seeing how gevent’s pywsgi server compares. I’m surprised Nicholas didn’t mention the lack of keep-alive support in the libevent-http based one.

      Reply
      • Nicholas Piël says:
        March 7, 2011 at 8:46 pm

        Dan,

        At the time I tested gevent, the libevent server still had keep-alives enabled by default. The change to disable it is more recent than this benchmark and you can still enable it if you want.

        But I agree, it will be very interesting to test the PyWSGI server!

        Reply
  58. fp says:
    December 3, 2010 at 12:23 am

    Response time curves look incorrect – look more like a throughput curves. Something is wrong in the tests.

    Reply
  59. Massimo says:
    December 15, 2010 at 1:54 am

    Any reason for not including rocket? https://launchpad.net/rocket
    It is quite popular considering that web2py uses it.

    Reply
  60. fijal says:
    December 15, 2010 at 9:00 am

    It would be cool to see how pure-python web servers (without parts in C) benefit from PyPy. I know twisted web is sped up something like 2x.

    Reply
  61. Vladimir says:
    January 1, 2011 at 10:49 pm

    Excelente artículo.

    Reply
  62. לקחי טכנולוגיה – 30.12.10 – buildout | ליאור שיאון - קיים משמע אני חושב. says:
    January 2, 2011 at 9:05 am

    [...] מצויינת של ביצועי ווב סרברז לפייטון שלא רק נכנסת לפרטי הבדיקות, בודקת הרבה סרסבים אלא גם [...]

    Reply
  63. 高性能python web服务器 - webguo在路上 says:
    January 24, 2011 at 1:59 am

    [...] app服务器决定着整个系统的响应快慢。通过参考Nicholas Piel写的《Benchmark of Python WSGI Servers》我圈定了以下几个服务器(模块):mod_wsgi for [...]

    Reply
  64. HG says:
    January 25, 2011 at 1:00 am

    Thanks for this very useful piece!

    Reply
  65. jell says:
    January 26, 2011 at 9:10 pm

    next time try to turn on epoll (on linux) or kqueue (on bsd) in twisted – it’s only two lines of code:
    from twisted.internet import epollreactor
    epollreactor.install()

    Reply
  66. any tutorial to set up nginx+ uwsgi to serve pylons apps? - - Coding Answers says:
    January 29, 2011 at 3:57 pm

    [...] try to avoid fat appache, which is the official Mediacore recommendation. In a famous “benchmark of python webservers” I’ve seen that uwsgi has amazing performance but it is rather newcommer. So I could [...]

    Reply
  67. Michael says:
    February 2, 2011 at 3:26 am

    Great article. I’d be really interested to see the differences between the servers with much larger output than “pong!”. uWSGI implements their own protocol (in fact uWSGI) which does something different with the output byte strings. Also due to the internal parts sharing / passing the output around, I’d imagine there’s be a significant difference in the outcomes of the tests.

    Reply
  68. Tout sur Rien » Notes sur les frameworks Python says:
    February 4, 2011 at 5:23 am

    [...] Performances des serveurs Python: http://nichol.as/benchmark-of-python-web-servers [...]

    Reply
  69. Sean Esopenko says:
    February 28, 2011 at 11:55 pm

    I wouldn’t recommend Tornado for a high-load website that requires an asynchronous server and doesn’t require long polling. If you’re working with a lot of long-polling ajax then Tornado is a godsend. This works great for things like casual MMO gaming servers, real-time communication social networking sites, etc, etc.

    I also like Tornado for quick prototyping. You can install it’s requisite library in most linux distros in literally seconds. Then, just work directly in python code and start it up on the command line.

    Not much out there can compare to Tornado’s long-polling capabilities which it’s ‘backwards’ design is perfectly suited for.

    Great article. I bookmarked it and will refer to it when needed.

    Reply
  70. elpres says:
    March 7, 2011 at 4:03 pm

    Another server that advertises itself as “screamingly fast, ultra-lightweigh” is https://github.com/jonashaag/bjoern . It would be interesting how it performs.

    Reply
    • Markus says:
      June 28, 2011 at 4:22 pm

      Would be interesting for me too, how fast “Bjoern” is..

      “A screamingly fast, ultra-lightweight asynchronous WSGI server for CPython, written in C using Marc Lehmann’s high performance libev event loop and Ryan Dahl’s http-parser_.”

      another quote

      “bjoern is the fastest, smallest and most lightweight WSGI server out there, featuring
      ~ 1000 lines of C code / Memory footprint ~ 600KB / Single-threaded and without coroutines or other crap / Full persistent connection (“keep-alive”) support in both HTTP/1.0 and 1.1, including support for HTTP/1.1 chunked responses”

      https://github.com/jonashaag/bjoern#readme

      Would be interesting to test if he is right and if not, to contact him to remove his statement :)

      Reply
  71. Tech Messages | 2011-03-07 | Slaptijack says:
    March 7, 2011 at 7:01 pm

    [...] Nicholas Piël » Benchmark of Python Web ServersNicholas has done an in-depth benchmarking of several WSGI servers in an effort to document their differences. He takes into account the type of server and which version of HTTP it supports. [...]

    Reply
  72. fapws3 + web.py « Nicolas314 says:
    March 7, 2011 at 10:39 pm

    [...] there claiming to be both easier to install (easy) and faster (not so easy) than Apache+mod_wsgi. A benchmark of Python web servers summarizes all good candidates today. I decided to give them all a quick try and see what they have [...]

    Reply
  73. ivoras says:
    March 7, 2011 at 10:46 pm

    Comparing pure Python web servers is kind of similar to the Special Olympics (“even the winners are handicapped”) because of Python’s GIL. Any kind of production Python web application would simply have to be put behind a multi-process front-end, whether with fastcgi, wsgi or any other protocol, otherwise you are just wasting hardware. Any pure Python server which relies only on Python threads to do anything is simply not going to scale.

    This is *very* easy to demonstrate: modify your application (pong.py) to do something CPU-intensive before returning the result (a dummy loop will probably be enough; do *NOT* call time.sleep() as it will put the process to sleep, not exercise the CPU, and do NOT call native C functions within Python, it must be only Python code in the application) and redo the tests. This kind of test would also be more realistic than the overly simplistic one simply returning “Pong” and I would be very interested in seeing the new results!

    Reply
    • Michael says:
      July 17, 2011 at 11:04 pm

      The GIL is only an issues with threads and locking.

      Tornado is callback based and not affected and is a pure python solution.

      Michael

      Reply
  74. Дайджест №8: Light of speed. | ByteFrames says:
    March 8, 2011 at 8:26 am

    [...] производительности веб-серверов для Python-приложений. Очень хороший бенч. uWSGI себя показал [...]

    Reply
  75. Sysadmin Sunday #22 « Boxed Ice Blog says:
    March 14, 2011 at 4:49 pm

    [...] Benchmark of Python WSGI Servers [...]

    Reply
  76. auphonic: Django Deployment with Nginx, uWSGI, virtualenv and Fabric says:
    June 18, 2011 at 1:05 pm

    [...] solution is the combination of Nginx, the high-performance HTTP server, and uWSGI (see benchmark of python WSGI servers). Furthermore, virtualenv and Fabric are invaluable tools to handle Python dependencies and to [...]

    Reply
  77. Flash socket policy server in Python based on gevent » gehrcke.de says:
    June 18, 2011 at 6:25 pm

    [...] on gevent June 18th, 2011 — by Jan-Philip Gehrcke Currently, I am looking into gevent — a nicely performing networking library for Python, based on the brilliant idea of greenlets. I try to use this for [...]

    Reply
  78. The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side » gehrcke.de says:
    June 26, 2011 at 5:54 pm

    [...] upfront this time: gevent in combination with gevent-websocket. Gevent is a nicely performing networking library for Python, based on the brilliant idea of greenlets. It allows you to write [...]

    Reply
  79. Building a web environment with debian, supervisor, git, nginx, uWSGI, Django, MySQL, fabric, pip, and virtualenv | Pavan Gupta says:
    July 31, 2011 at 7:20 am

    [...] nginx and uWSGI a try!  Both of these web servers are lightening fast and it’s nice to be able to break static content off to a server designed specifically for [...]

    Reply
  80. onenicy says:
    August 1, 2011 at 1:25 am

    питание баланс диетаправильное питание несовместимость продуктовкак правильно используя тренировки в тренажерном зале похудетьпочему кремлевская диета противопоказана диабетикамкак похудеть без спортахудеть без диет силой подсознанияпохудеть на 5 7 кгдиета для уменьшения талиидиета шоколод и водапохудеть и не поправлятьсясвекольный сок чисткак диетадиета при запорах у беременныхдиета при нарушении углеводного обмена у детейкто похудел на кремлевской диете отзовитесьпароварка для похуденияклиника похудение город бор нижегородской областидиета для беременных как не набрать весдиета для диобетиков второго типаванна для похудения с сольюсвойства имбиря для похудения

    Reply
  81. Запуск django и других python проектов при помощи uwsgi+nginx | | Sugury says:
    August 7, 2011 at 9:42 pm

    [...] Сравнение производительности WSGI серверов: http://nichol.as/benchmark-of-python-web-servers [...]

    Reply
  82. Running our Django site with mod_wsgi and virtualenv (part 2) « techno milk says:
    August 10, 2011 at 11:39 pm

    [...] The problem that will remain, I guess, is how to deal with different simultaneous Python versions. I’m not sure mod_wsgi will be capable of that, and it may be time to move to a more modern setup (like nginx as a frontend for uWSGI, which is stated to be sysadmin-friendly and seems to do very well in benchmarks). [...]

    Reply
  83. 高性能python web服务器 says:
    August 17, 2011 at 10:40 am

    [...] app服务器决定着整个系统的响应快慢。通过参考Nicholas Piel写的《Benchmark of Python WSGI Servers》我圈定了以下几个服务器(模块):mod_wsgi for [...]

    Reply
  84. Quora says:
    August 23, 2011 at 6:44 am

    What are best-practices for deploying a web app with PyPy? (gunicorn, tornado, etc?)…

    After reading a very interesting thread on Python Stacks in Hacker News (http://news.ycombinator.com/item?id=2910953), I decided to give uwsgi a try and it seems to perform incredibly well, which I can vouch myself as far as my limited testing since ye…

    Reply
  85. Anonymous says:
    August 25, 2011 at 12:22 pm

    Do you want to test out the same with the latest versions of the servers that you used here? It’s more than 1 yr since your results and an update will be great. That will also show how active the respective developers are.

    Reply

Leave a Reply

Click here to cancel reply.

SiteSupport

Working on:

SiteSupport - Remote desktop for web apps
remote desktop for web apps

We've just launched our first product demo, check it out!

Posts

  • Announcing: SiteSupport
  • ZeroMQ an introduction
  • Benchmark of Python WSGI Servers
  • Asynchronous Servers in Python
  • Person Recognition (with Python)

Tags

ai async cdn comet computer vision gevent javascript performance programming Python rant scalability sitesupport websockets wsgi zeromq

Follow

Follow on Twitter
Subscribe to the RSS feed
Receive updates by Email

Running on Wordpress
design based on Freshy by Jidé, the nutmeg image is from Shlomit & Ziv
(c) Nicholas Piël