Socket Sharding in NGINX Release 1.9.1
NGINX release 1.9.1 introduces a new feature that enables use of the SO_REUSEPORT socket option, which is available in newer versions of many operating systems, including DragonFly BSD and Linux (kernel version 3.9 and later). This option allows multiple sockets to listen on the same IP address and port combination. The kernel then load balances incoming connections across the sockets, effectively sharding the socket.
(For NGINX Plus customers, this feature will be available in Release 7, which is scheduled for later this year.)
The SO_REUSEPORT socket option has many real-world applications, such as the potential for easy rolling upgrades of services. For NGINX, it improves performance by more evenly distributing connections across workers.
As depicted in the figure, when the SO_REUSEPORT option is not enabled, a single listening socket by default notifies workers about incoming connections as the workers become available. If you include the accept_mutex off
directive in the events
context, the single listener instead notifies all workers about a new connection at the same time, putting them in competition to grab it. This is known as the thundering herd problem.
With the SO_REUSEPORT option enabled, there is a separate listening socket for each worker. The kernel determines which available socket (and by implication, which worker) gets the connection. While this does decrease latency and improve performance as workers accept connections, it can also mean that workers are given new connections before they are ready to handle them.
Configuring Socket Sharding
To enable the SO_REUSEPORT socket option, include the new reuseport
parameter to the listen
directive, as in this example:
http {
server {
listen 80 reuseport;
server_name localhost;
…
}
}
Including the reuseport
parameter disables accept_mutex
for the socket, because the lock is redundant with reuseport
. It can still be worth setting accept_mutex
if there are ports on which you don’t set reuseport
.
Benchmarking the Performance Improvement
I ran a wrk benchmark with 4 NGINX workers on a 36-core AWS instance. To eliminate network effects, I ran both client and NGINX on localhost, and also had NGINX return the string OK
instead of a file. I compared three NGINX configurations: the default (equivalent to accept_mutex on
), with accept_mutex off
, and with reuseport
. As shown in the figure, reuseport
increases requests per second by 2 to 3 times, and reduces both latency and the standard deviation for latency.
I also ran a more real-world benchmark with the client and NGINX on separate hosts and with NGINX returning an HTML file. As shown in the chart, with reuseport
the decrease in latency was similar to the previous benchmark, and the standard deviation decreased even more dramatically (almost ten-fold). Other results (not shown in the chart) were also encouraging. With reuseport
, the load was spread evenly across the worker processes. In the default condition (equivalent to accept_mutex on
), some workers got a higher percentage of the load, and with accept_mutex off
all workers experienced high load.
Latency (ms) | Latency stdev (ms) | CPU Load | |
Default | 15.65 | 26.59 | 0.3 |
accept_mutex off | 15.59 | 26.48 | 10 |
reuseport | 12.35 | 3.15 | 0.3 |
Acknowledgments
Thanks to Sepherosa Ziehau and Yingqi Lu, who each contributed a solution to the NGINX project that enables use of the SO_REUSEPORT socket option. The NGINX team combined ideas from both solutions to create what we feel is an ideal solution.
Building a New Application?
Break complex applications into independent and highly-reliable components to increase performance and time to market. Learn about microservices in the new ebook by O'Reilly.