Building images in CI/CD pipeline can be quite different from builds on local machine. One major difference is availability of cache. In the local environment you most likely have all the resources, dependencies and image layers cached from previous builds and therefore your builds might take just a few seconds. In the CI pipeline on the other hand, there's no local cache, which can cause the builds to take several minutes. There's solution to this though, and in this article we will look at how we can solve it both with and without Docker and for any CI/CD platform you might be using.
The Generic Solution
The idea for the generic solution that would work in any environment is pretty simple - we need to somehow create or bring the cache to the pipeline. We have 2 options here - either we point the builder tool (e.g. Docker) to the repository of our image from which it can retrieve image layers and use them as cache, or alternatively, we store the layers on a filesystem which we make available to the pipeline and grab the layers from there. Either way, we need to create the cache by pushing the image to repository or to filesystem, then - in the subsequent builds - we try to use it and if that doesn't work because of cache-miss, we update it with new layers.
Now let's see how we can do that in practice with various tools...
The simplest solution to this problem is to use Docker with BuildKit. BuildKit is a set of enhancements for
docker build which improves performance, storage management and adds couple extra features, including better caching functionality. To build container image with BuildKit, all we need to do is prepend
DOCKER_BUILDKIT=1 to each command:
# Warm up cache ~ $ DOCKER_BUILDKIT=1 docker build -t martinheinz/docker-cached --build-arg BUILDKIT_INLINE_CACHE=1 . ... => => writing image sha256:09f473587beb1a1f240a776760655637ca00894a2a31b730019ecfee48d43848 0.0s => => naming to docker.io/martinheinz/docker-cached 0.0s => exporting cache 0.0s => => preparing build cache for export 0.0s ~ $ docker push martinheinz/docker-cached # Build using cache repo ~ $ DOCKER_BUILDKIT=1 docker build --cache-from martinheinz/docker-cached . => [internal] load metadata for docker.io/library/ubuntu:latest 0.5s => importing cache manifest from martinheinz/docker-cached 0.0s => CACHED [1/1] FROM docker.io/library/ubuntu@sha256:44ab2c3b26363823dcb965498ab06abf...50743df0d4172d 0.0s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:09f473587beb1a1f240a776760655637ca00894a2a31b730019ecfee48d43848 0.0s
This example should be self-explanatory to anyone who ever built an image with Docker. Only real difference between this and basic Docker usage is the addition of
BUILDKIT_INLINE_CACHE=1 which tells BuildKit to enable inline cache exporter. This makes sure that Docker writes metadata needed for caching into the image. This metadata will be then used in subsequent builds to find out which layers can be cached. The only other difference in the above snippet is the command output - during the first build we can see that Docker exports the cache to the repository, while during the second one it imports cache manifests and also uses one cached layer.
The use of BuildKit as part of Docker is convenient, but it hides some features and options. So, in case you want more control over the build and caching, then you can directly use the upstream BuildKit project. To do so, you will need to download binaries from GitHub release page, unpack it and move it into your path (e.g.
/usr/local/bin/). Finally, you need to start the BuildKit daemon and then you're ready to build:
sudo cp ~/Downloads/buildkit-v0.9.0.linux-amd64/bin/buildctl /usr/local/bin/ sudo cp ~/Downloads/buildkit-v0.9.0.linux-amd64/bin/buildkitd /usr/local/bin/ sudo cp ~/Downloads/buildkit-v0.9.0.linux-amd64/bin/buildkit-runc /usr/local/bin/ sudo buildkitd
If we want to perform the same cached build with upstream BuildKit as we did with the Docker integration, we will need to craft a bit more complicated command:
sudo buildctl build martinheinz/docker-cached \ --output type=image,name=docker.io/martinheinz/docker-cached,push=true \ --export-cache type=inline \ --import-cache type=registry,ref=docker.io/martinheinz/docker-cached \ --frontend=dockerfile.v0 --local context=. --local dockerfile=. ... => => pushing layers 0.6s => => pushing manifest for docker.io/martinheinz/docker-cached:latest@sha256:d5e200aa86c...18e234cc92 0.4s ... => exporting cache 0.0s => => preparing build cache for export 0.0s
As you can see here, there's a lot of flags and arguments that we had to specify, which can be annoying, but allows for great customizability. One advantage of this approach is that we don't need to run
docker push, instead we include
push=true in one of the arguments and
buildctl takes care of pushing the image.
Another advantage of using BuildKit in this way is ability to push the image and the cached layers into separate repositories or tags. In this example we will store the image itself in
docker-cached:latest, while the cache will live in
sudo buildctl build martinheinz/docker-cached \ --output type=image,name=docker.io/martinheinz/docker-cached,push=true \ --export-cache type=registry,ref=docker.io/martinheinz/docker-cached:buildcache \ --import-cache type=registry,ref=docker.io/martinheinz/docker-cached:buildcache \ --frontend=dockerfile.v0 --local context=. --local dockerfile=. # During first build - `=> ERROR importing cache manifest from docker.io/martinheinz/docker-cached:buildcache` # During second build - `=> importing cache manifest from docker.io/martinheinz/docker-cached:buildcache`
For completeness, I will also mention that it's also possible to leverage the above mentioned advanced features of BuildKit without installing it separately. For that you will need
buildx which is a Docker CLI plugin for extended build capabilities.
buildx however, has different arguments than
buildctl, so you will need to adjust your build commands based on the docs here.
With that said, we're doing all these shenanigans to improve CI/CD build performance, so running these commands locally is nice for testing, but we need to somehow perform this in the environment of some CI/CD platform, and the environment of choice for me is Kubernetes.
To make this work in Kubernetes, we will need to bring a couple of additional things - namely credentials for pushing the image and volume used as a workspace:
apiVersion: batch/v1 kind: Job metadata: name: buildkit spec: template: spec: restartPolicy: Never initContainers: - name: prepare image: alpine:3.10 command: - sh - -c - 'echo -e "FROM ubuntu\nENTRYPOINT ["/bin/bash", "-c", "echo hello"]\n" > /workspace/Dockerfile' volumeMounts: - name: workspace mountPath: /workspace containers: - name: buildkit image: moby/buildkit:master command: - buildctl-daemonless.sh args: [ "build", "--frontend", "dockerfile.v0", "--local", "context=/workspace", "--local", "dockerfile=/workspace", "--output", "type=image,name=docker.io/martinheinz/docker-cached,push=true", "--import-cache", "type=registry,ref=docker.io/martinheinz/docker-cached","--export-cache", "type=inline"] securityContext: privileged: true env: - name: DOCKER_CONFIG value: /docker/.docker volumeMounts: - name: docker-config mountPath: /docker/.docker - name: workspace readOnly: true mountPath: /workspace volumes: - name: docker-config secret: secretName: buildkit-docker-config items: - key: config.json path: config.json - name: workspace persistentVolumeClaim: claimName: buildkit-workspace
The above is a single Job, which first creates a
Dockerfile inside the workspace provided by PersistentVolumeClaim using an init container. The actual job then performs the build as shown earlier. It also mounts repository credentials from Secret named
buildkit-docker-config, which is needed so that BuildKit can push both the cached layers and the image itself to the repository.
For clarity, I omitted the manifests of the PersistentVolumeClaim and Secret used above, but if you want test it out yourself, then you can find those here.
Docker is not however, the only tool for building images that can help us leverage cache during CI/CD builds. One of the alternatives to Docker is Google's Kaniko. Its advantage is that it's meant to be run as container image, which makes it suitable for environments like Kubernetes.
Considering that this tool is meant for CI/CD pipelines, we need to simulate the same conditions locally to be able to test it. To do so, we will need a couple of directories and files that will be used as volumes:
mkdir volume && cd volume echo 'FROM ubuntu' >> Dockerfile echo 'ENTRYPOINT ["/bin/bash", "-c", "echo hello"]' >> Dockerfile mkdir cache mkdir config cp ~/.docker/config.json config/config.json # or podman login --authfile config/config.json tree . |____Dockerfile -> Sample Dockerfile (will be mounted as workspace) |____cache -> Cache directory/volume |____config -> Config directory/volume |____config.json
Above we created 3 things - a sample
Dockerfile consisting of single layer, which we will use for testing. Next, we created a
cache directory which will be mounted into container and used for storing cached image layers. Finally, we created
config directory, containing registry credentials, which will be mounted read-only.
In previous section we only looked at the caching image layers using image registry/repository, with Kaniko though, we can also use a local directory/volume as a cache source. To do that we first need to "warm-up" the cache aka populate it with image layers:
# Warm up (populate) the cache with base image(s) ~ $ docker run --rm \ -v $(pwd):/workspace \ gcr.io/kaniko-project/warmer:latest \ --cache-dir=/workspace/cache \ --image=ubuntu # --image=more-images ~ $ ls cache/ sha256:3555f4996aea6be945ae1532fa377c88f4b3b9e6d93531f47af5d78a7d5e3761 sha256:3555f4996aea6be945ae1532fa377c88f4b3b9e6d93531f47af5d78a7d5e3761.json
Note: This section is about building images and caching images without docker, however during testing outside of Kubernetes, we still need to run the Kaniko image somehow, and that's using
Kaniko project provides 2 images -
executor, above we used the former, which takes variable number of images and uses them to populate specified cache directory.
With the cache ready, we can move onto building the image. This time we use the
executor image, passing in 2 volumes - one for registry credential (mounted read-only) and one for workspace, which we pre-populated with sample
Dockerfile. Additionally, we specify flags to enable caching as well as destination, where the final image will be pushed:
# Use the cache ~ $ docker run --rm \ -v $(pwd)/config/config.json:/kaniko/.docker/config.json:ro \ -v $(pwd):/workspace \ gcr.io/kaniko-project/executor:latest \ --dockerfile=/workspace/Dockerfile \ --cache \ --cache-dir=/workspace/cache \ --destination martinheinz/kaniko-cached \ --context dir:///workspace/ ... INFO Returning cached image manifest INFO Found sha256:3555f4996aea6be945ae1532fa377c88f4b3b9e6d93531f47af5d78a7d5e3761 in local cache INFO Found manifest at /workspace/cache/sha256:3555f4996aea6be945ae1532fa377c88f4b3b9e6d93531f47af5d78a7d5e3761.json
These examples show us how it works in theory, but in practice we will want to run this on Kubernetes. For that we will need similar set of objects as in the example with BuildKit, that is - volume claim for cache directory, volume claim for workspace (Dockerfile), a secret with registry credentials and a Job or Pod that will execute
apiVersion: v1 kind: Pod metadata: name: kaniko spec: containers: - name: kaniko image: gcr.io/kaniko-project/executor:latest args: ["--dockerfile=/workspace/Dockerfile", "--context=dir://workspace", "--destination=martinheinz/kaniko-cached", "--cache", "--cache-dir=/cache"] volumeMounts: - name: kaniko-docker-config mountPath: /kaniko/.docker/ - name: kaniko-cache mountPath: /cache - name: kaniko-workspace mountPath: /workspace restartPolicy: Never volumes: - name: kaniko-docker-config secret: secretName: kaniko-docker-config items: - key: config.json path: config.json - name: kaniko-cache persistentVolumeClaim: claimName: kaniko-cache - name: kaniko-workspace persistentVolumeClaim: claimName: kaniko-workspace
Here, assuming that we already have the cache populated using
warmer image, we run
kaniko executor, which retrieves
/workspace directory, cached layers from
/cache and credentials from
/kaniko/.docker/config.json. If everything goes well, we should see in logs that the cached layers were found by Kaniko
Caching layers from local volume can be useful, but most of the time you'll probably want to use remote registry. Kaniko can do that too, and all we need to do is change a couple of arguments:
apiVersion: v1 kind: Pod metadata: name: kaniko spec: containers: - name: kaniko image: gcr.io/kaniko-project/executor:latest args: ["--dockerfile=/workspace/Dockerfile", "--context=dir://workspace", "--destination=martinheinz/kaniko-cached", "--cache", "--cache-copy-layers", "--cache-repo=martinheinz/kaniko-cached"] volumeMounts: - name: kaniko-docker-config mountPath: /kaniko/.docker - name: kaniko-workspace mountPath: /workspace restartPolicy: Never volumes: - name: kaniko-docker-config secret: secretName: kaniko-docker-config items: - key: config.json path: config.json - name: kaniko-workspace persistentVolumeClaim: claimName: kaniko-workspace
The important change we made here is that we replaced
--cache-dir flag with
--cache-repo. Additionally, we were also able to omit the volume claim used for cache directory.
Besides Kaniko, there are quite a few other tools that can build a container image. The most notable one is
podman, which leverages
buildah to build images. Using these 2 for caching however, is not an option right now. The
--cache-from option is available in
buildah, it is however NOOP, so even if you specify it, nothing will happen. So, if you want to migrate your CI from Docker to Buildah and the caching is a requirement, then you will need to wait for this issue to be implemented/resolved.
This article described how we can leverage layer caching to improve build performance. If you're experiencing bad performance in image builds, chances are though, that problem doesn't lie in missing caching, but rather in the commands in your
Dockerfile. Therefore, before you in jump into implementing layer caching, I'd suggest you try to optimize structure of your
Dockerfiles first. Additionally, the caching will only work if you have well-structured
Dockerfiles, because after first cache miss, no further cached layers can be used.
Besides caching layers, you might also want to cache dependencies, that way you can save time needed to download libraries from NPM, PyPI, Maven or other artifact repositories. One way to do this would be using BuildKit and its
--mount=type=cache flag described here.