Updates on Netflix’s Container Management Platform

We continue to share lessons learned from scheduling and executing containers in production at scale. This blog posts summarizes not only recent publications on our container management platform (Project Titus) but also future collaboration opportunities.

Titus

Publications

We were honored when the Association for Computing Machinery (ACM) asked for us to write for their bimonthly ACM Queue publication, and recently published our article entitled “Titus: Introducing Containers to the Netflix Cloud”. In the article, we talk about why we designed Titus in the manner we did, and why we chose to introduce containers into our existing cloud native virtual machine architecture for service and batch workloads. We talk about unique aspects of Titus such as AWS and Netflix infrastructure integration, networking, capacity management and approaches to operational challenges. And finally, we share our future plans and expectations for container management at Netflix.

Conferences

Last year at re:Invent 2016, we documented how Titus works under the covers. At Mesoscon 2017, we covered how we schedule efficiently on an elastic cloud. During our talk at QCon NYC 2017, we talked about the challenges we have seen operating Titus for production workloads over the last two years. We believe operational issues and scheduling efficiency are key issues to understand regardless of container platform.

Open Source

While there is some benefit in socializing what we have built and learned operating Titus for two years, we know we can do better. We have heard requests from the community to open source Titus, letting people learn from the exact code we run in production at Netflix.

Open sourcing a project requires great amount of work and responsibility, especially for projects as complex as Titus. Also, a healthy open source project requires more than a single company to grow a lasting community. In that spirit, Netflix has started to collaborate with others who have similar challenges as Netflix when running containers. We have found three categories of collaborators that are looking for unique values from Titus. Specifically, those who are looking for battle hardened:

  • Natively integrated container solution within Amazon Web Services (AWS)
  • NetflixOSS integrated container management platform, specifically one that works well with Spinnaker (our continuous delivery platform) or our cloud RPC frameworks based on Eureka
  • A modern Apache Mesos unified batch and service container scheduler that works well on an elastic cloud with Docker containers

We are currently working under a private collaboration model which includes sharing our code privately as we work towards a community driven fully open source project.

Upcoming Conferences

At re:Invent this year, we will be talking at the “NET402: Elastic Load Balancing Deep Dive and Best Practices” session about how we are extending our networking support using Application Load Balancer (ALB) integrations with IP Target Groups. We will also be in attendance at QCon SF 2017 and KubeCon/CloudNativeCon and hope to connect with collaborators.

If you are interested in getting in contact with the Titus team or are attending either QCon SF 2017 or KubeCon/CloudNativeCon, please touch base with Andrew Spyker (linkedin, twitter).