Even smaller Docker image sizes
In one of our recent posts, we looked at some of the core strategies for reducing a docker image size. Since then, new ways to reduce an image size have emerged, enabling even faster distribution of images and startup times.
In this post, I’m going to set up a common scenario: a Dockerfile that builds a Go binary, that runs in a Docker container in a production environment. I’ll start by showing how you would approach this with multiple Dockerfiles, and then explore using multi-stage builds introduced today in Docker 17.06 as a way to make a more maintainable build environment.
Separate Build and Runtime Images
Some builds have prerequisites that take up a lot of space that need to be later cleaned up, leading to large commands that are brittle and difficult. Often, this can be the case when installing native Node.js modules, Ruby gems, Python packages, or any other languages that do a lot of native binding for performance. This is even more apparent in compiled languages where the final output is a binary and any of its dynamically linked libraries.
Instead of trying to ensure that build-essential, libpq, and others are installed and deleted when no longer needed, consider using separate build and runtime images. In this model, we have a "build" Dockerfile that is concerned with all of the steps necessary to build the artifact.
FROM golang:1.8-alpine RUN go get github.com/kardianos/govendor RUN go get github.com/nicksnyder/go-i18n/goi18n RUN go get github.com/jteeuwen/go-bindata/go-bindata RUN make build # Exports binaries to to /bin/$BINARY
To build this in a way that’s useful, we need to mount a system volume when we run our docker build command:
docker build -t myapp -v $(PWD)/bin:/bin .
Once our artifacts are built, we export everything required for our application to run from the container and discard the build container. From there, we copy that artifact into our runtime container, which contains only the necessary dependencies we need to actually run our application. This will depend on your runtime stack, but this may be as large as a JVM-based container with system-level native dependencies, or as small as the empty scratch container.
One of the challenges with this method is that the architecture you use to build any native dependencies has to be the same as the architecture you will run in the runtime container. If you build on amd64, it must run on amd64. One critical place where this often gets missed is with the Alpine container. The libc standard in Alpine is musl, while in many common operating systems it is glibc, causing runtime errors.
If you want to use Alpine, make sure to build your application with Alpine.
Finally, we copy everything over to our runtime container:
FROM alpine ADD ./bin/replicated /bin/replicated EXPOSE 8080 ENTRYPOINT ["/bin/replicated"]
Now we are ready to run! This container only contains the bare minimum we need to run our application, without any of the additional build-time requirements we had in the prior Dockerfile. This produces a small and compact runtime image that doesn’t contain any unnecessary data.
There are some challenges associated with this approach. The introduction of multiple Dockerfiles means that more tooling is needed to juggle the images, containers, and files needed to produce a final runtime image. This can make working in certain CI tools more difficult as well, especially distributed CI tools that use a fresh environment per step. Fortunately, this pattern is so common that it has been codified by Docker with a new Dockerfile feature called Multi-Stage Builds.
The builder pattern is so popular for producing images that Docker has codified it as of Docker 17.06. With this feature, we can use a single Dockerfile with multiple FROM statements, copying artifacts from prior images into subsequent images. By using the COPY directive with the --from flag, we can move files from one image to the other. Using this, our prior build and run Dockerfiles now becomes:
FROM golang:1.8-alpine RUN go get github.com/kardianos/govendor RUN go get github.com/nicksnyder/go-i18n/goi18n RUN go get github.com/jteeuwen/go-bindata/go-bindata RUN make build # Exports binaries to to ./bin/$BINARY FROM alpine COPY --from=0 ./bin/replicated /bin/replicated ENTRYPOINT ["/bin/replicated"]
As before, we build with docker build as usual, but this time we don’t necessarily need to mount our bin volume!
docker build -t myapp .
This command will run both stages, outputting one image based on our second stage, containing only the base Alpine components and our replicated binary. No source, and no build-time binaries like govendor, goi18n, and go-bindata.
The only disadvantage here is the position of our Dockerfiles now matter. We can go a little further by naming our stages instead of using an index:
FROM golang:1.8-alpine as builder RUN go get github.com/kardianos/govendor RUN go get github.com/nicksnyder/go-i18n/goi18n RUN go get github.com/jteeuwen/go-bindata/go-bindata RUN make build # Exports binaries to to ./bin/$BINARY FROM alpine as runner COPY --from=builder ./bin/replicated /bin/replicated ENTRYPOINT ["/bin/replicated"]
Now our Dockerfiles are smaller and more readable than ever!
Building the above Dockerfile using the command docker build -t myapp . will result in in a single image named myapp, and it does not contain the Go build environment. We discarded that when we started a new FROM statement on line 8. Unlike Dockerfiles without multiple FROM commands, the resulting image doesn’t need to be squashed, and in fact, wouldn’t benefit from squashing.
This provides us with concise, readable Dockerfiles that take advantage of Docker’s built-in tooling. We no longer have to worry about juggling and chaining together multiple Dockerfiles, nor do we have to remember to add, remove, or keep the right dependencies for our applications.
By exporting the exact things we need for runtime, we are shipping images that are smaller, faster to pull (and run!), and have a smaller surface area for attacks.