Running Docker in production for 6 months

Managing a dev team at a full stack development company can be hard. You have to always keep the over-enthusiastic developers focussed at the right task. More often than not, you to have to remind them that it is important to finish the change request that a client submitted yesterday by EOD than trying out the fancy new javascript library. To keep us from hurting our business, we also have to make sure that we don’t jump at the wrong slope of any technology hype cycle. And that is probably the reason we were quite late in adopting Docker for any of our projects. We were actually asked to use Docker by one of our clients for his project. We liked it, thought it was useful and continued using it for all other projects. We have recently reduced docker usage due to number issues surpassing the perceived benefits for our purposes. Here are few details about our experience-

Getting started is hard
When we started, we went through a large number of documentation, videos, tutorials and other resources to get our head around using docker. After going through all that, it seems like there is no one place where you’d find everything you need to know about docker. There are bits and pieces everywhere including obsolete documentation which is no longer applicable and you are left with the task of piecing everything together. This is really strange given the popularity and buzz around docker.

Orchestration
This is one of the things we had to figure out by ourselves - the concept of containers is separate from the concept of orchestration. If you try to use docker for more than a hello-world, you’ll most likely need some kind of orchestration. We used docker-compose and stuck to it. We later realised that there are other alternatives available. Docker-compose has few limitations but it worked fine for our purposes and we didn’t find enough reason to figure out if kubernetes was a better option.

Orchestration across multiple physical server gets even more nasty where you’d have to use something like Swarm. We’ve since realised that swarm is one of the lesser preferred options for orchestrating clusters.

Running out of disk space
This is a very frustrating and ugliest part of using docker. We realized about this problem when we were happily pushing code to deployment and one of the clients’ machines suddenly went down due to low disk space. Fortunately, it was only a staging server. During development, a large number of docker images tend to pile up on your machine pretty fast. Since images sizes can be as high as few GB, its easy to run out of disk space. This is another problem with docker which you have to figure out yourself. Despite the fact that everyone who’s ever used docker seriously has to come across this issue sooner or later; no one tells you about this at the outset. This is pretty annoying. There is no inbuilt docker command to deal with this. The sad part is there are a lot of hacks available and not a single standard solution. We ended up setting an hourly cron job to run docker-gc script on all our dev and production machines!

Docker registry
For our use case, it’d been a lot of overhead for us to host our own docker registry. The docker hub registry provides only one private repo for free. Since the client wasn’t keen on spending on getting a more private repos, we managed with the single repo to save our base image with the most of the dependencies bundled into it.

Workflow
This is another aspect of any serious usage of docker that the tutorials usually ignore. We took quite some time to figure our entire workflow of how to setup dev and production orchestration, databases, backups, dependency management within the team and keeping the base image updated all the time. I’ll probably write another blogpost about how docker workflow.

Dependency management
Docker’s one major benefit is managing dependencies across different dev machines. We are mainly a Python/Django shop and before docker, we’ve been happily managing dependencies with virtualenv and a simple requirements.txt. Most of the times, that was all we needed to manage environments across all dev machines. Sometimes, we used Vagrant. So docker did bring some benefits by letting us specify dependencies right from OS, environment variables, native libraries, etc. it wasn’t really a game-changer for us.

Longer build times
Initially we had a single dockerfile for the project. This meant rebuilding the entire docker image each time we added a single python library dependency. And this would have to be done on every dev machine. With virtualenv it was as easy as a single pip install command. We eventually created a separate dockerfile for our base image which would just include all the dependencies.

Good practices dictate that you don’t mount your source code directory in the docker container in production. Which means you also have to rebuild the image on test/staging server every time you make a single line of code change. This meant a little more delay in addition to the slightly slow docker-compose orchestration, which made the deployments slower.

DB and Persistence
We spent hours figuring out a good way to use a databases in both dev and production with docker. It was tricky since docker containers don’t support persistence unless you use a mount-point. There were a few patterns documented which didn’t work for us or we didn’t really like. We had to figure it out by ourselves. This is another area where you’re expected to figure out by yourself whether it is a good idea to use docker to run your production database. Hint- Its not.

Logging
This is a problem in both dev and production. On dev although you’re still running a django runserver and can get console logs, we’ve frequently experienced issues with django autoreload and delay in flushing logs to the console while using django debug server with docker. On production, since your source code directory isn’t mounted in container on the server, you have to add a special mount point to get back your server logs.

After using Docker for about 6 months, a lot broken things that were trying to fit together and the time we were spending to just figure things out, didn’t seem worth all the fuss. There could be more issues with using Docker but these were enough for us to limit its usage to only a handful of projects. Here is a another thorough article on extensive docker usage in production.

Comments

Job 3 hours, 40 minutes ago

Thanks for the nice article on your pain points.

If you want an easier solution for Docker Registry, consider using the Registry built-into GitLab. You can directly publish to the registry from GitLab CI, which should save you some time in the build process.

Link | Reply

Swapnil Talekar 3 hours, 36 minutes ago

Thanks for the tip. Will try that.

Shani Elharrar 3 hours, 33 minutes ago

I use Docker in production for about a year, Some comments / solutions to your problems :

1. Logging - You can Install fluentd or any other supported "driver" (splunk, journlad and more) and use the docker logging driver to ship your logs somewhere else : https://docs.docker.com/engine/admin/logging/fluentd/ - This way, you just need your programs to write their logs to STDOUT; In our java programs, we used to ship our logs to FluentD using a custom appender that shipped the logs directly to FluentD, and we also wrote to STDOUT so we can use "docker logs" to watch our logs live; Don't forget to limit the log size, since eventually you will run out of disk space: (https://docs.docker.com/engine/admin/logging/overview/#/json-file-options)

2. Running out of disk space - We are running our cluster on AWS and we're deploying it using terraform and cloudinit, and every time we need to deploy a new version, we just deploy new servers, that way there is no way that we're going to hit the disk limit. If you are running on the cloud (even a personal cloud - like CloudStack), I recommend you to read about Immutable Infrastructure.

3. DB and Persistence - We used docker to deploy Elasticsearch, They we're up for months, when we needed to upgrade the services, we just stopped the old container and started a new container with the same mounts - It worked, but I'm not sure I would recommend to deploy Oracle DB inside a docker container :-), I think that people should google whether their DB may be deployed using docker or not.

4. Deployment Time - We created an AMI (Amazon Machine Image) with docker bundled inside to reduce deployment time, Our Scala code took about 8 minutes to build, and then there was the "docker build & push" build which took another 1:30 minutes (There were about 10 images inside it), Our "Staging" build took 30 minutes to run (And it created a whole environment from scratch, run the tests and then destroyed the environment) - I did used to put the "ADD" / "COPY" commands last in the dockerfile, so every thing else would be cached, and I tried to minimize the commands inside the dockerfile in order to minimize the number of layers in the file (https://www.ctl.io/developers/blog/post/optimizing-docker-images/)

I hope I helped you with the information above.

Swapnil Talekar 3 hours, 25 minutes ago

Your suggestion are pretty useful. Will definitely try them out. Thanks!

meme 3 hours, 32 minutes ago

Docker is pure evil; it will production ready for 1-2 years from now.

Matt Freeman 2 hours, 43 minutes ago

Wait, your client didnt want to spend $7mo for 5 private repos or whatever it is for 10+ so you worked at hacks to save him/her a latte a month?

Swapnil Talekar 2 hours, 40 minutes ago

And this was not even the strangest client we had!

Gianluca Arbezzano 1 hour, 23 minutes ago

Hello! Thanks for your story in my experience this sentence is totally wrong "This is really strange given the popularity and buzz around docker.". Usually when I look around big ecosystem used from a lot of people there are blogpost and videos everywhere and it's really hard to be in touch with the most up to date information. Docker is trying to do something with the Docker Captain group (I am one of them), with weekly newsletter and so on. It's an hard work but internet works in this way not only into the tech environment. Really thanks for all your point. Most of them are really good like: "Workstation", "Logging" and so on.. I am writing a book about docker in production http://scaledocker.com about some of the topics that you mentioned. I need to think about workstation too!

Swapnil Talekar 1 hour, 19 minutes ago

Thanks Gianluca, I appreciate what you're doing with Docker captains, which I hadn't looked up yet. I'll be glad to provide you with more feedback to help you with your work.

Sanyam Kapoor 44 minutes ago

1. I think Docker is a highly evolving technology and learning as you go might be the best way to work around things. I'd also suggest only to rely on the simpler things first. Don't rely on all of Docker to be safe and also don't build anything on top of it unless you have the time and resources.

2. Orchestration is definitely something that still in works. I personally prefer Mesos due to the level of control but then that is just one of the many choices. I'd definitely recommend against Swarm because Docker doesn't necessarily have a good reputation regarding breaking changes.

3. You could use projects like "docker-gc" by Spotify to garbage collect old containers and images. This is very essential in highly ephemeral environments.

4. I personally used and loved the Gitlab CI and all of my projects were tested and deploy via Gitlab CI. It had finally added support for container building on shared runners as well when I started using the Gitlab Registry as well. It works like charm. I have a generic .gitlab-ci.yml if you'd want to have a look. Since the Gitlab CI is an all-docker based runner, you might want to make sure to have docker images for development and testing purposes ready to improve build times. That is what I had to eventually do. It shouldn't be a big deal though once setup.

5. I'd personally recommend against using the Logging drivers in Docker because the Docker daemon is highly unreliable. Rather logging should be strongly coupled with your application itself with appropriate production drivers (and stdout during development). Dependency on Docker's logging drivers makes it a little (or who knows) inconvenient if tomorrow they change once your systems have hardened. Coupling with application keeps you in control instead all the way.

6. Since one of your aims is also to improve the development workflow, good base images is an absolute must. If you have a highly polyglot environment, it is better that your teams coordinate among themselves and decide on a baseline dependency requirement.

7. I'd also still recommend that you create separate standalone files for dependency management and not embed directly in the Dockerfile. This will help you keep afloat and switch platforms as and when a project needs.

Sanyam Kapoor 42 minutes ago

As a side-note, one of Mesos' advantage is that it supports Docker containers without the need of Docker. And Mesos has been reputed to be much more reliable than Docker. You could bank on that.

Weblog