In my previous article I described the Docker image cache and how Docker decides that a particular layer needs to be rebuilt when you do a docker build
. Now, let’s build on that knowledge and discuss some strategies for making the Dockerfile code/build/test cycle as fast and reliable as possible.
Top-to-Bottom
This one should be pretty obvious by now, but as you’re iterating on your Dockerfile you should try and keep the stable parts toward the top and make your additions at the bottom.
If you know you need to install a bunch of OS packages in your image (which is typically one of the slower parts of building an image) put your package installation instructions toward the top of the Dockerfile. That way you only need to sit through the installation process for those packages once as you go through the code/build/test/repeat cycle for your image.
Similarly, if you have a core set of instructions that you use across all of your images (like a MAINTAINER
value you always use), it’s best to keep those at the top of your Dockerfile and always in the same order. That way those cached layers can be shared between different images.
The Build Context
When executing docker build
the first line of output typically reads “Sending build context to Docker deamon . . .” The build context constitutes everything in your build directory (the directory that you pass to the docker build
command) and is used by Docker so that you can inject local files into your image using the ADD
and COPY
instructions.
This is the one place where the caching rules change slightly — in addition to looking at the instruction and the parent image, Docker will also check to see if the file(s) being copied have changed.
Let’s create a simple Dockerfile that uses ADD
to copy a file into our image:
FROM debian:wheezy
ADD README.md /tmp/
Now let’s docker build
the image:
$ docker build -q -t readme .
Sending build context to Docker daemon 3.584 kB
Sending build context to Docker daemon
Step 0 : FROM debian:wheezy
---> e8d37d9e3476
Step 1 : ADD README.md /tmp/
---> 09eabce38f39
Removing intermediate container 3e44a3b6eabe
Successfully built 09eabce38f39
If we were to execute the docker build
again we’d see that no new images are created since we haven’t changed anything.
However, let’s update the README.md file and then build again:
$ touch README.md
$ docker build -q -t readme .
Sending build context to Docker daemon 3.584 kB
Sending build context to Docker daemon
Step 0 : FROM debian:wheezy
---> e8d37d9e3476
Step 1 : ADD README.md /tmp/
---> 03057a46a5c7
Removing intermediate container 989edbcf38ae
Successfully built 03057a46a5c7
Note that a new image was generated for the ADD
instruction this time (compare the image ID here to the one from the previous run). We didn’t change anything inside the Dockerfile, but we did update the timestamp on the README.md file itself.
For the most part, this is exactly the behavior we want when building images. If the file changes in some way, you would expect that the next build of the image would incorporate the changes to that file. However, things get a bit trickier when you start adding lots of files at once.
A common pattern is to inject an application’s entire codebase into an image using an instruction like:
ADD . /opt/myapp
In this case we’re injecting the entire build context into the image. If any single file changes in the entire build context, it will invalidate the cache and a new image layer will be generated on the next build.
If your build directory happens to include things like log files or test reports that are updated frequently you may find that you’re getting new image layers generated with every single docker build
. You could work-around this by specifically ADD
ing ONLY those files which are necessary for your application but if you have many files spread across a number of directories this can be pretty tedious.
Luckily, Docker has a better solution in the form of the .dockerignore file. In much the same way that the .gitignore file works, the .dockerignore file allows you to specify a list of exclusion patterns. Any files/directories matching those patterns will be excluded from the build context.
If you have files in your build directory that change often and are not required by your image, you should consider adding them to .dockerignore file. A good rule of thumb is that anything in your .gitignore is a good candidate for inclusion in your .dockerignore.
One Catch-22 related to the use of ADD .
is that the Dockerfile itself is also part of the build context — so any changes you make to the Dockerfile result in a change to the build context, and you can’t add the Dockerfile to the .dockerignore file because it needs to be part of the build context in order for Docker to read the build instructions. If you’re using ADD .
and making changes to your Dockerfile don’t be surprised to see new image layers generated every time you do a build.
Bust the Cache
For the most part, the image cache is incredibly helpful and can save you a lot of time while building your images. However, there are times when the caching can bite you if you aren’t paying attention, so it’s good to know how to selectively bust the cache.
Let’s say we have a Dockerfile which contains the following:
RUN git clone https://github.com/bdehamer/dot_files.git
WORKDIR /dot_files
RUN git checkout v1.0.0
When I build this the first time, I’m going to get exactly what I expect — it’ll clone my Git repo and checkout the v1.0.0 tag.
Now imagine I push some changes to my repo and tag it as v1.1.0. I’m going to update the Dockerfile to reference the new tag:
RUN git clone https://github.com/bdehamer/dot_files.git
WORKDIR /dot_files
RUN git checkout v1.1.0
When I go to build the image from the updated Dockerfile I get the following error:
. . .
Step 7 : RUN git clone https://github.com/bdehamer/dot_files.git
---> Using cache
---> 104e2ed02220
Step 8 : WORKDIR /dot_files
---> Using cache
---> 7d120a36b1a5
Step 9 : RUN git checkout v1.1.0
---> Running in 86dd626440ac
error: pathspec 'v1.1.0' did not match any file(s) known to git.
2014/08/05 20:26:11 The command [/bin/sh -c git checkout v1.1.0] returned a non-zero code: 1
I definitely pushed a v1.1.0 tag to my repo, yet Git is telling me that no such tag is found.
This is one of those times where the Docker image cache is being a little too helpful. In the output above note how the git clone
step had already been cached from our previous build and was re-used in this run. When we get to the git checkout
instruction we’re still using a copy of the repo that doesn’t have a v1.1.0 tag.
This is quite different from the example with the build context above. In this case the contents of the git repo are not part of the build context — as far as Docker is concerned, our git clone
is just another instruction that happens to match one that already exists in the cache.
The brute-force solution here is to simply run docker build
with the --no-cache
flag and force it to re-create all the layers. While that will work, it doesn’t allow us to take advantage of any earlier instructions in the Dockerfile that were just fine to be pulled from the cache.
A better approach is to refactor our Dockerfile a bit to ensure that any future changes to the tag will force a fresh git clone
as well:
WORKDIR /dot_files
RUN git clone https://github.com/bdehamer/dot_files.git . && \
git checkout v1.1.0
Now we’ve combined the git clone
and git checkout
into a single instruction in the Dockerfile. If we later edit the file to change the tag reference it will invalidate the cache for that layer and we’ll get a fresh clone when the new layer is generated.
Note also that I moved the WORKDIR
instruction so that the directory would be created before the cloning the repo. Then, by cloning into the current directory (note that .
after the repo’s URL), I was able to execute my clone
and checkout
without needing to switch directories in-between
When building images based-off of Debian/Ubuntu you’ll often see this same pattern applied to installing OS packages:
RUN apt-get update && apt-get install -y vim=2:7.3*
Here the apt-get update
is like the git clone
in the previous example — we want to ensure that we’ve got access to all the latest packages anytime we add another package or update the version of the vim.