Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Google shares software network load balancer design powering GCP networking
Wednesday, March 16, 2016
At
NSDI ‘16
, we're revealing the details of
Maglev
1
, our software network load balancer that enables
Google Compute Engine load balancing
to serve a million requests per second with no pre-warming.
Google has a long history of building our own networking gear, and perhaps unsurprisingly, we build our own network load balancers as well, which have been handling most of the traffic to Google services since 2008. Unlike the custom
Jupiter
fabrics that carry traffic around Google’s data centers, Maglev load balancers run on ordinary servers
—
the same hardware that the services themselves use.
Hardware load balancers are often deployed in an
active-passive
configuration to provide failover, wasting at least half of the load balancing capacity. Maglev load balancers don't run in active-passive configuration. Instead, they use Equal-Cost Multi-Path routing (ECMP) to spread incoming packets across all Maglevs, which then use consistent hashing techniques to forward packets to the correct service backend servers, no matter which Maglev receives a particular packet. All Maglevs in a cluster are active, performing useful work. Should one Maglev become unavailable, the other Maglevs can carry the extra traffic. This N+1 redundancy is more cost effective than the active-passive configuration of traditional hardware load balancers, because fewer resources are intentionally sitting idle at all times.
Google’s highly flexible cluster management technology, called
Borg
, makes it possible for Google engineers to move service workloads between clusters as needed to take advantage of unused capacity, or other operational considerations. On
Google Cloud Platform
, our customers have similar flexibility to move their workloads between zones and regions. This means that the mix of services running in any particular cluster changes over time, which can also lead to changing demand for load balancing capacity.
With Maglev, it's easy to add or remove load balancing capacity, since Maglev is simply another way to use the same servers that are already in the cluster. Recently, the industry has been moving toward Network Function Virtualization (NFV), providing network functionality using ordinary servers. Google has invested a significant amount of effort over a number of years to make NFV work well in our infrastructure. As Maglev shows, NFV makes it easier to add and remove networking capacity, but having the ability to deploy NFV technology also makes it possible to add new networking services without adding new, custom hardware.
How does this benefit you, as a user of GCP? You may recall we were able to scale from
zero to one million requests per second
with no pre-warming or other provisioning steps. This is possible because Google clusters, via Maglev, are already handling traffic at Google scale. There's enough headroom available to add another million requests per second without bringing up new Maglevs. It just increases the utilization of the existing Maglevs.
Of course, when utilization of the Maglevs exceeds a threshold, more Maglevs are needed. Since the Maglevs are deployed on the same server hardware that's already present in the cluster, it's easy for us to add that capacity. As a developer on Cloud Platform, you don’t need to worry about load balancing capacity. Google’s Maglevs, and our team of Site Reliability Engineers who manage them, have that covered for you. You can focus on building an awesome experience for your users, knowing that when your traffic ramps up, we’ve got your back.
-
Posted by Daniel E. Eisenbud, Technical Lead, Maglev and Paul Newson, Developer Advocate (Maglev fan)
1
D. E. Eisenbud, C. Yi, C. Contavalli, C. Smith, R. Kononov, E. Mann-Hielscher, A. Cilingiroglu, B. Cheyney, W. Shang, and J. D. Hosein. Maglev: A Fast and Reliable Software Network Load Balancer, 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), 2016
↩
Free Trial
GCP Blogs
Big Data
Labels
Android
1
Announcement
15
Announcement Partners Technical Customers Compute Networking Storage Big Data & Analytics Developers Compute Engine Cloud Storage Cloud SQL Cloud BigTable
1
Announcements
1
api
2
app engine
50
Atmosphere Live
1
Big Data & Analytics
7
bigquery
15
BigTable
2
CDN
1
Cloud Console
2
Cloud Dataflow
5
Cloud Datastore
6
cloud endpoints
1
Cloud Pub/Sub
2
Cloud SDK
1
cloud sql
12
cloud storage
27
Cloudera
1
Compute
5
Compute Engine
56
container cluster
1
Container Engine
1
Container Registry
1
customer
59
Customers
3
Dataflow
4
DataLab
1
Dev Tools
1
developer tools
5
developer-insights
6
Developers
2
Developers Console
2
devfests
4
Disaster Recovery
1
Encryption Keys
1
ESG
1
Event
4
events
11
GA
1
Gaming
1
Go Client
1
Google App Engine
5
Google Apps
1
Google BigQuery
8
Google Cloud Deployment Manager
1
Google Cloud Networking
2
Google Cloud Platform
8
Google Cloud Storage
7
Google Compute Engine
9
Google Container Engine
1
gRPC
1
hadoop
3
Hardware
1
Helium
1
how to
2
IO2013
3
iOS
1
Kubernetes
15
Levyx
1
Local SSD
2
Logging
1
mapreduce
1
Media
3
Mobile
1
Nearline
1
networking
3
open source
98
PaaS Solution
1
Partner
12
Partners
1
Pricing
3
Products
7
Pub/Sub
2
Research
1
round-up
8
Server
1
Siggraph
1
solutions
4
Startup
1
Storage
2
Tableau
1
TCO
1
Technical
21
Windows
1
Wowza
1
Zync
3
Archive
2016
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Feed
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Follow @googlecloud