Since this post, Docker has released improved support for writing complex and still maintainable Dockerfiles. Check out our blog post on multi-stage Docker builds.
There’s been a welcome focus in the Docker community recently around image size. Smaller image sizes are being championed by Docker and by the community. When many images clock in at multi-100 MB and ship with a large ubuntu base, it’s greatly needed. Here’s a review of the top 10 image sizes (latest tag) on Docker Hub today:
IMAGE NAME SIZE busybox 1 MB ubuntu 188 MB swarm 17 MB nginx 134 MB registry 423 MB redis 151 MB mysql 360 MB mongo 317 MB node 643 MB debian 125 MB
A lot of the benefit can be had by simply using a small base image (Alpine Linux, BusyBox, etc). Enough has been written about using these base images, so I assume you’ve already picked a good one. After that, it’s up to the maintainer of the Dockerfile to know some best practices and keep the image size small. Specifically, we’ll examine the image size implications of joining multiple
RUN commands onto one line and some practical examples of best practices for apt-get (ie removing the
apt-get cache and
Docker images are built from a layered filesystem. Each layer only contain the differences between it and the one below it. At the top, you see a unified view, but the history of how it was built is maintained. Each line in a Dockerfile creates a new layer on top of the existing stack.
For example, let’s start with a Dockerfile snippet that looks like this
ADD https://storage.googleapis.com/golang/go1.5.3.src.tar.gz /tmp # do some things with that file RUN rm /tmp/go1.5.3.src.tar.gz
You might think you are doing a good and responsible thing by deleting the .tar.gz file when you are done. But the layer containing that file is still part of the image. You mask it from the final image with the rm command, but the contents of that .tar.gz file is still are still in the image layer, and will still be downloaded by everyone who docker pulls your image.
It’s better to write it all on one line so it’s not committed to the image as separate layers. For example, a small rewrite of the snippet above would be:
RUN curl -o \ /tmp/go.1.5.3.src.tar.gz \ https://storage.googleapis.com/golang/go1.5.3.src.tar.gz && \ <do some things with the file> && \ rm /tmp/go1.5.3.src.tar.gz
It’s not as pretty to look at, but it results in a much more efficient image size. If that line really annoys you, write it in a script, then
RUN it in the Dockerfile.
Most Dockerfile authors know that you should
apt-get remove any unecessary packages. One common example is an image that’s built with curl and/or wget to download files. You can
apt-get remove curl afterwards, but the layer containing them will remain present in the final image. Remove them (and all auto installed dependencies) in the same Dockerfile line you added them.
This is especially tricky for complex Dockerfiles, so let’s walk through an example.
Here’s a simplified version of a typical Dockerfile that might run a python service. Don’t worry, we will optimize this.
FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y curl python-pip RUN pip install requests ADD ./my_service.py /my_service.py ENTRYPOINT ["python", "/my_service.py"]
my_service.py is a python script that simply contains:
#!/usr/bin/python print 'Hello, world!'
Time to build and check the image size:
$ sudo docker build -t size . $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE size latest da8a9be731ac 4 seconds ago 360.5 MB ubuntu 14.04 6cc0fc2a5ee3 2 weeks ago 187.9 MB
Yikes. The 188 MB base image makes sense from the table above, but we’ve practically doubled the image size to run a hello-world python script. What exactly is being reported in the 360.5 MB number? It’s the total of the “visible” layer (the top one, da8… in my example) and all layers that were used to create this top layer.
We should probably clean up after ourselves. Let’s try a Dockerfile that looks like this:
FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y curl python-pip RUN pip install requests ## Clean up RUN apt-get remove -y python-pip curl RUN rm -rf /var/lib/apt/lists/* ADD ./my_service.py /my_service.py ENTRYPOINT ["python", "/my_service.py"]
Building and checking on that yields:
$ sudo docker build -t size . $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE size latest c6dacdd00660 2 seconds ago 361.3 MB ubuntu 14.04 6cc0fc2a5ee3 2 weeks ago 187.9 MB
It grew larger (slightly)! Cleaning up after ourselves has backfired!
Let’s try collapsing the apt operations into a single line:
FROM ubuntu:14.04 RUN apt-get update && \ apt-get install -y curl python-pip && \ pip install requests && \ apt-get remove -y python-pip curl && \ rm -rf /var/lib/apt/lists/* ADD ./my_service.py /my_service.py ENTRYPOINT ["python", "/my_service.py"]
Building and running this version yields:
$ sudo docker build -t size . $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE size latest e531f8674f33 9 seconds ago 338 MB ubuntu 14.04 6cc0fc2a5ee3 2 weeks ago 187.9 MB
Ok, that made it smaller. But why is it still so huge? I was expecting a lot less.
It turns out that
apt-get install brings along a handful of other “recommended” packages. Recommended packages for apt are simply dependencies that may or may not be required. Some users will require them because of their environment or how they use the package, but it’s not always a requirement.
Running pip on Ubuntu 14.04, it’s very easy to confirm that there are no side effects of removing the recommended packages from this installation. This is something you should definitely test before you ship this off to production. A quick scan of the official packages on Docker Hub show that redis, mysql, mongo, postgres, elasticsearch and more use this technique to make their images smaller.
Let’s try it again with
--no-install-recommends in the apt-get.
FROM ubuntu:14.04 RUN apt-get update && \ apt-get install -y --no-install-recommends curl python-pip && \ pip install requests && \ apt-get remove -y python-pip curl && \ rm -rf /var/lib/apt/lists/* ADD ./my_service.py /my_service.py ENTRYPOINT ["python", "/my_service.py"]
Building and running this version yields:
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE size latest fddc30aee4dc 6 seconds ago 229.2 MB ubuntu 14.04 6cc0fc2a5ee3 2 weeks ago 187.9 MB
Ok, that just dropped 120 MB from the image. This looks good.
Create a Dockerfile strategy in your organization to control this. The Dockerfile syntax is easy to learn, but very nuanced when it comes to optimization.