Delicious Docker: Build minimal containers
It’s a common best practice in the Docker community to keep containers as small as possible. There are a few reasons for this:
-
small images are faster to copy around,
-
they are faster to start,
-
by removing unnecessary components from your image, you reduce the attack surface.
While working on Kroki, one of my goal was to build minimal containers. To give you a bit of context, Kroki provides a unified API on top of more than a dozen diagrams libraries. And as you might have guessed, each diagram library has specific requirements and runtime! As a matter of fact, they are written in various programming language: Haskell, Go, Java, Node.js, Python…
Action plan
I took the following approach:
-
create an image from scratch (literally
FROM scratch
). -
build a statically linked binary (for each diagram library) that can run on top of scratch!
In this article, I will describe a few tools and techniques I’ve discovered along the way. Let’s start with the easiest first!
Haskell
For reference, I’ve never written a single line of Haskell and I’m not familiar with the Haskell ecosystem. But surprisingly, it was relatively easy to create a statically linked binary thanks to this great article by Vaibhav Sagar.
One thing led to another, Niklas Hambüchen and Akos Marton helped me to setup an even easier build based on Stack.
My goal here was to build a statically linked binary of Erd. If you don’t know this utility, it can translate a plain text description of a relational database schema to a graphical entity-relationship diagram. The visualization is produced by using dot from GraphViz.
Starting with Docker 17.05, we can use multi-stage build. Basically, we can use different base images, and each of them begins a new stage of the build.
In our case, we will define two steps:
-
The first step will use an Ubuntu image to build the project and create a static binary using stack.
-
The second step will copy the static binary built at the first step in a scratch image.
# step 1: build a static binary
FROM ubuntu:18.04 AS builder
RUN apt-get update && apt-get install -y graphviz curl git
RUN curl -sSL https://get.haskellstack.org/ | sh
RUN git clone https://github.com/BurntSushi/erd.git
WORKDIR erd
RUN git checkout v0.2.0.0
RUN /usr/local/bin/stack install --ghc-options="-fPIC" \
--ghc-options="-static" \
--ghc-options="-optl=-static" \
--ghc-options="-optc=-static"
# step 2: build a small image
FROM scratch
COPY --from=builder /root/.local/bin/erd /haskell/bin/erd
ENTRYPOINT ["/haskell/bin/erd"]
And here’s the result, a 21 MB image:
REPOSITORY TAG SIZE
scratch-erd latest 21MB
For reference, here’s the size of the haskell:8
image:
REPOSITORY TAG SIZE
haskell 8.8.3 1.47GB
Please note that, in this specific example, the GraphViz library is still required at runtime.
To workaround this issue, we could instead use an Alpine base image and install GraphViz using RUN apk add --update --no-cache graphviz
.
Rust
Stefan Seemayer describes in a blog post how to statically compile an 'Hello World' server in Rust. In this article, Stefan is not using a multi-stage build. Instead, the current working directory is mapped into the container:
$ alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src ekidd/rust-musl-builder' $ rust-musl-builder cargo build --release
To be consistent, we will use the ekidd/rust-musl-builder
image in a multi-stage build:
# step 1: build a static binary
FROM ekidd/rust-musl-builder:stable AS builder
RUN cargo install --version 0.4.2 svgbob_cli
# step 2: build a small image
FROM scratch
COPY --from=builder /home/rust/.cargo/bin/svgbob /rust/bin/svgbob
ENTRYPOINT ["/rust/bin/svgbob"]
Nothing fancy here, we are using the cargo
command line to install a pre-built binary of svgbob
from crates.io.
And here’s the result, a 4 MB image:
REPOSITORY TAG SIZE
scratch-svgbob latest 4.01MB
Go
It comes at no surprise that someone already wrote a great blog post on how to create the smallest Golang Docker image. In this blog post, Cyrille Hemidy, points out that the official Docker image for Go is 779 MB!
Here, we’re using the same approach described above with a multi-stage build:
# step 1: build a static binary
FROM golang:alpine AS builder
RUN apk add --update --no-cache git
RUN git clone https://github.com/yuzutech/kroki-cli.git
WORKDIR kroki-cli
RUN git checkout v0.3.0
RUN go get -d -v
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o /go/bin/kroki
# step 2: build a small image
FROM scratch
COPY --from=builder /go/bin/kroki /go/bin/kroki
ENTRYPOINT ["/go/bin/kroki"]
If you are using Go < 1.10, dont forget to add CGO_ENABLED=0
otherwise you will get the following error when trying to run the binary:
standard_init_linux.go:211: exec user process caused "no such file or directory"
This obscure message means that a dynamic library is missing because the binary is not statically linked.
And here’s the result, from 779 MB to less than 10 MB:
REPOSITORY TAG SIZE
scratch-kroki latest 8.91MB
Node.js
As we’ve seen above, building a statically linked binary is relatively easy in an Haskell, Rust or Go ecosystem. But what about Node.js?
Thanks to pkg, it’s possible to package a Node.js application into an executable that can be run even on containers without Node.js installed.
# step 1: build a single binary
FROM node:12.16.2-alpine3.11 AS builder
# Workaround: https://github.com/nodejs/docker-node/issues/813#issuecomment-407339011
# Error: could not get uid/gid
# [ 'nobody', 0 ]
RUN npm config set unsafe-perm true
RUN npm install -g pkg pkg-fetch
ENV NODE node10
ENV PLATFORM alpine
ENV ARCH x64
RUN /usr/local/bin/pkg-fetch ${NODE} ${PLATFORM} ${ARCH}
RUN apk add --update --no-cache git
RUN git clone https://github.com/skanaar/nomnoml.git
WORKDIR nomnoml
RUN git checkout v0.6.2
RUN npm i
RUN /usr/local/bin/pkg --targets ${NODE}-${PLATFORM}-${ARCH} dist/nomnoml-cli.js -o nomnoml.bin
# step 2: build a small image
FROM alpine:3.11
COPY --from=builder /nomnoml/nomnoml.bin /node/bin/nomnoml
RUN apk add --no-cache libstdc++
ENTRYPOINT ["/node/bin/nomnoml"]
In the example above, we are using pkg
to create a binary from the Nomnoml diagram library but the logic can be applied to almost any Node.js project.
As you can see, we get the sources from GitHub, then we install the dependencies using npm i
, and finally we execute the pkg
command line to produce a binary named nomnoml.bin
from the dist/nomnoml-cli.js
file.
Please note that the binary is not statically linked. If you run ldd
on the binary you will get:
$ ldd /nomnoml/nomnoml.bin
/lib/ld-musl-x86_64.so.1 (0x7fc3d3bc4000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fc3d168a000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7fc3d1676000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fc3d3bc4000)
That’s why we are not building the image FROM scratch
but FROM alpine:3.11
to make sure that the system shared libraries will be available and compatible.
And here’s the result:
REPOSITORY TAG SIZE
alpine-nomnoml latest 50.9MB
That’s still not bad for an Alpine based image.
For reference, the builder base image node:12.16.2-alpine3.11
(also based on Alpine) is about 90MB.
Java
With GraalVM and Quarkus it’s now possible to build native images that generate a native binary but it has a few limitations. If you can, you should definitely build a native binary but if you cannot then one solution is to use Alpine as a base image.
Alpine Linux
The main benefit is that Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.
But the main caveat is that it does use musl libc instead of glibc and friends, so certain software might run into issues depending on the depth of their libc requirements.
However, most software doesn’t have an issue with this, so this variant is usually a very safe choice.
Keep in mind that, since Alpine is not officially supported, the OpenJDK organization on Docker only publish Early Access versions of the JDK on Alpine. So, if you want to use a stable version of the JDK, you should instead use the Docker images produced by the AdoptOpenJDK organization.
They come in two flavors, a "slim" JDK image and and JRE image, both based on Alpine:
REPOSITORY TAG SIZE
adoptopenjdk/openjdk11 jre-11.0.6_10-alpine 147MB
adoptopenjdk/openjdk11 jdk-11.0.6_10-alpine-slim 253MB
For comparison, here’s an image based on Ubuntu Bionic and another one based on Debian Buster "slim":
REPOSITORY TAG SIZE
adoptopenjdk 13.0.2_8-jre-hotspot-bionic 225MB
openjdk 11.0.6-jre-slim-buster 204MB
As you can see, they are relatively small and the image based on Alpine Linux is "only" 27% smaller. At the same time, if you are using a JRE instead of a JDK on Alpine Linux, you will get a 40% smaller image.
jlink
I’m not using jlink
(yet) but this tool, introduced in JDK 9, can be used to create a custom runtime image where only the required Java modules are included.
Combined with a multi-stage build, it’s possible to build a smaller runtime for your Java application.
fat-jar
Since my application is relatively lightweight, I’ve decided to use a single executable jar that contains all my dependencies (also known as "fat-jar"). As a result, the Dockerfile is really straightforward:
# based on alpine 3.11
FROM adoptopenjdk/openjdk11:jre-11.0.6_10-alpine
EXPOSE 8000
COPY target/kroki-server.jar .
ENTRYPOINT exec java -jar kroki-server.jar
This approach is also useful because the application can be executed outside of Docker using the same command line: java -jar kroki-server.jar
.
Using a single executable jar is convenient but it’s not the most optimal because the entire application is build as a single image layer. So even when you change a single line of your code, the entire application layer will be rebuilt.
If you are building an application with a lot of dependencies (and your build is slow), you should try jib.
Summary
The size of your Docker images matters but it should not be the only deciding factor. For instance, you don’t want to spend days or weeks trying to make your application work on Alpine. If your application is working fine on Debian, then use a minimal base image and carefully add additional packages.
You should also think about security. While it’s true that by removing unnecessary components from your image, you reduce the attack surface, it is also true that it’s safer to use an active and well-maintained base image with security updates rather than a less reliable but smaller base image.
And finally, you should think about troubleshooting. If you cannot execute an interactive shell on the container (because you don’t even have one installed!) then it will be really hard to troubleshoot a problem.
So the smallest image possible is not always the best solution!
If you want to learn more, I recommend this great series by Jérôme Petazzoni: