Backup-and-Restore of Containers with Kubernetes Checkpointing API

Kubernetes v1.25 introduced Container Checkpointing API as an alpha feature. This provides a way to backup-and-restore containers running in Pods, without ever stopping them.

This feature is primarily aimed at forensic analysis, but general backup-and-restore is something any Kubernetes user can take advantage of.

So, let's take a look at this brand-new feature and see how we can enable it in our clusters and leverage it for backup-and-restore or forensic analysis.

Setup

Before we start checkpointing any containers, we need a playground where we can mess with kubelet and its workloads. For that we will need a v1.25+ Kubernetes cluster and container runtime that supports container checkpointing.

We will create such cluster using kubeadm inside VM(s) built with Vagrant. I've created repository with everything necessary to spin up such cluster with just vagrant up, so if you want to follow along, do check it out.

If you want to build your own cluster, then make sure it satisfies the following:

The cluster must have ContainerCheckpoint feature flag enabled. For kubeadm use following configuration:


# kubeadm-config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  ContainerCheckpoint: true
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.25.0
apiServer:
  extraArgs:
    feature-gates: "ContainerCheckpoint=true"
controllerManager:
  extraArgs:
    feature-gates: "ContainerCheckpoint=true"
scheduler:
  extraArgs:
    feature-gates: "ContainerCheckpoint=true"
networking:
  podSubnet: 10.244.0.0/16

This will pass the --feature-gates flag to each of the cluster components. For full list of available feature gates see docs.

Additionally, we will also need to use a container runtime that supports checkpointing. At the time of writing only CRI-O supports it, with containerd probably coming soon-ish.

To configure your cluster with CRI-O, install it using instructions in docs, or use the convenience script in above-mentioned repository (you should run this in VM, not on your local machine).

Additionally, we need to enable CRIU for CRI-O, which is the tool that does the actual checkpointing in the background.

To enable it, we need to set --enable-criu-support=true flag. The above convenience script does that for you.

Also, if you plan to restore it back into a Pod, you will also need --drop-infra-ctr set to false, otherwise you will get CreateContainerError with message like:


kubelet  Error: pod level PID namespace requested for the container, ...
... but pod sandbox was not similarly configured, and does not have an infra container

With CRI-O installed, we also have to tell kubeadm to use its socket, the following configuration with take care of that:


# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.56.2
  bindPort: 6443
nodeRegistration:
  criSocket: "unix:///var/run/crio/crio.sock"
---
# ... the previous snippet goes here
# Full config at:
# https://github.com/MartinHeinz/kubeadm-vagrant-playground/blob/container-checkpoint-api/kubernetes/kubeadm-config.yaml

With that in place we can spin-up cluster with:


kubeadm init --config=.../kubeadm-config.yaml --upload-certs | tee kubeadm-init.out

This should give us a single node cluster such as (note the container runtime version):


kubectl get nodes -o wide
NAME         STATUS   ROLES           AGE   VERSION  ...  OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kubemaster   Ready    control-plane   82s   v1.25.4  ...  Ubuntu 20.04.5 LTS   5.4.0-125-generic   cri-o://1.25.0

Note: Generally, the best and simplest way to play with Kubernetes is using KinD. KinD however (as of time of writing) doesn't support container checkpointing. Alternatively you can also try local-up-cluster.sh script in Kubernetes repository.

Checkpointing

With that out of the way, we can try creating a checkpoint. The usual operations on Kubernetes can be done with kubectl or by running curl commands against cluster API server. This however won't work here, as the checkpointing API is only exposed on the kubelet on each cluster node.

Therefore, we have to jump onto the node and talk to kubelet directly:


vagrant ssh kubemaster
sudo su -

# Check if it's running...
systemctl status kubelet

kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sat 2022-11-12 10:25:29 UTC; 30s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 29501 (kubelet)
    Tasks: 14 (limit: 2339)
   Memory: 34.7M
   CGroup: /system.slice/kubelet.service
           └─29501 /usr/bin/kubelet --bootstrap-kubeconfig=... --kubeconfig=...

To create checkpoint we also need a running Pod. Rather than using system Pods in kube-system, let's create a dummy Nginx webserver in default namespace:


kubectl taint nodes --all node-role.kubernetes.io/control-plane-
kubectl run webserver --image=nginx -n default
kubectl get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP          NODE
webserver   1/1     Running   0          27s   10.85.0.4   kubemaster

Above you can see that we also removed taints from the node - this allows us to schedule workloads on the node even though it's part of control-plane.

Next, let's make a sample API request to kubelet to see if we can get any valid response:


curl -skv -X GET  "https://localhost:10250/pods" \
  --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
  --cacert /etc/kubernetes/pki/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt

{
  "kind": "PodList",
  "apiVersion": "v1",
  "metadata": {},
  "items": [
    {
      "metadata": {
        "name": "webserver",
        "namespace": "default",
        ...
        }
    }
    ...
}

kubelet by default runs on port 10250, so we curl it and ask for all its Pods. We also had to specify CA certificate, client certificate and key for authentication.

Now it's time to finally create a checkpoint:


curl -sk -X POST  "https://localhost:10250/checkpoint/default/webserver/webserver" \
  --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
  --cacert /etc/kubernetes/pki/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt

# Response:
# {"items":["/var/lib/kubelet/checkpoints/checkpoint-webserver_default-webserver-2022-11-12T10:28:13Z.tar"]}

# Check the directory:
ls -l /var/lib/kubelet/checkpoints/

total 3840
-rw------- 1 root root 3931136 Nov 12 10:28 checkpoint-webserver_default-webserver-2022-11-12T10:28:13Z.tar

# Verify that original container is still running:
crictl ps --name webserver
CONTAINER      IMAGE                               CREATED         STATE    NAME       ATTEMPT  POD ID         POD
880ee7ddff7f3  docker.io/library/nginx@sha256:...  48 seconds ago  Running  webserver  0        d584446dd8d5e  webserver

The checkpointing API is available at .../checkpoint/${NAMESPACE}/${POD}/${CONTAINER}, here we used the webserver Pod created earlier. This request created an archive in /var/lib/kubelet/checkpoints/checkpoint-<pod>_<namespace>-<container>-<timestamp>.tar.

Depending on the setup you're using, after running the above curl, you might receive error along the lines:


checkpointing of default/webserver/webserver failed (CheckpointContainer is only supported in the CRI v1 runtime API)
# or
checkpointing of default/webserver/webserver failed (rpc error: code = Unknown desc = checkpoint/restore support not available)

That means that your container runtime doesn't (yet) support checkpointing, or it's not enabled correctly.

Analyzing

We now have a checkpointed container archive, so let's take a look at what's inside:


cd /var/lib/kubelet/checkpoints/
# Rename because "tar" doesn't like ":" in names
mv "checkpoint-webserver_default-webserver-2022-11-12T10:28:13Z.tar" webserver.tar
# View contents:
tar --exclude="*/*" -tf webserver.tar

dump.log
checkpoint/
config.dump
spec.dump
rootfs-diff.tar
io.kubernetes.cri-o.LogPath

# Extract:
tar -xf checkpoint-webserver_default-webserver-2022-09-04T10:15:37Z.tar
ls checkpoint/
cgroup.img        fdinfo-4.img  ids-31.img        mountpoints-13.img       pages-2.img               tmpfs-dev-139.tar.gz.img
core-1.img        files.img     inventory.img     netns-10.img             pages-3.img               tmpfs-dev-140.tar.gz.img
core-30.img       fs-1.img      ipcns-var-11.img  pagemap-1.img            pages-4.img               tmpfs-dev-141.tar.gz.img
core-31.img       fs-30.img     memfd.img         pagemap-30.img           pstree.img                tmpfs-dev-142.tar.gz.img
descriptors.json  fs-31.img     mm-1.img          pagemap-31.img           seccomp.img               utsns-12.img
fdinfo-2.img      ids-1.img     mm-30.img         pagemap-shmem-94060.img  timens-0.img
fdinfo-3.img      ids-30.img    mm-31.img         pages-1.img              tmpfs-dev-136.tar.gz.img


cat config.dump
{
  "id": "880ee7ddff7f3ce11ee891bd89f8a7356c97b23eb44e0f4fbb51cb7b94ead540",
  "name": "k8s_webserver_webserver_default_91ad1757-424e-4195-9f73-349b332cbb7a_0",
  "rootfsImageName": "docker.io/library/nginx:latest",
  "runtime": "runc",
  "createdTime": "2022-11-12T10:27:56.460946241Z"
}

tar -tf rootfs-diff.tar
var/cache/nginx/proxy_temp/
var/cache/nginx/scgi_temp/
var/cache/nginx/uwsgi_temp/
var/cache/nginx/client_temp/
var/cache/nginx/fastcgi_temp/
etc/mtab
run/nginx.pid
run/secrets/kubernetes.io/
run/secrets/kubernetes.io/serviceaccount/

If you don't need a running Pod/container for analysis, then extracting and reading through some of the above shown files might give you the necessary information.

With that said, I'm no security expert, so I'm not going to feed you questionable information here about how to analyze these files.

As a starting point though, you might want check out tools like docker-explorer or this talk on container forensics in Kubernetes.

Restoring

While the Checkpointing API is currently aimed more at forensic analysis, it still can be used to restore the Pod/container from the archive.

The simplest way is to create an image from checkpoint archive:


FROM scratch
# Need to use ADD because it extracts archives
ADD webserver.tar .

Here we use an empty (scratch) image to which we add the archive. We need to use ADD because it automatically extracts archives. Next, we build it with docker or buildah:


cd /var/lib/kubelet/checkpoints/
# Or docker build ...
buildah bud \
  --annotation=io.kubernetes.cri-o.annotations.checkpoint.name=webserver \
  -t restore-webserver:latest \
  Dockerfile .

buildah push localhost/restore-webserver:latest docker.io/martinheinz/restore-webserver:latest

Above we also specify annotation which describes the original human-readable name of the container and then we push to some registry so that Kubernetes can pull it.

Finally, we create a Pod, specifying the previously pushed image:


# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: restore-webserver
  labels:
    app: nginx
spec:
  containers:
  - name: webserver
    image: docker.io/martinheinz/restore-webserver:latest
  nodeName: kubemaster

To test if it worked, we can expose the Pod via Service and curl its IP:


kubectl expose pod restore-webserver --port=80 --target-port=80
kubectl get svc

NAME                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes          ClusterIP   10.96.0.1      <none>        443/TCP   14m
restore-webserver   ClusterIP   10.104.30.90   <none>        80/TCP    17s

curl http://10.104.30.90

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
</html>

And it worked! We successfully backed-up a running Pod without stopping it and recreated it to its original state.

Closing Thoughts

General checkpoint-and-restore for containers has been possible for a while thanks to CRIU, but this is still a big step for Kubernetes, and hopefully, we will see this feature/API graduate to Beta and eventually GA at some point.

The previous sections demonstrated the usage of checkpointing API - it's very much usable, but it also lacks some basic features, such as native restore functionality or support from all major container runtimes. So, be aware of its limitations if you decide to enable it in production (or even development) environments/clusters.

With that said, this feature is very cool addition not just for forensic analysis - in the future when better/native restore process is available, this could become a proper backup-and-restore process for container workloads, which might be very useful for certain types of long-running Kubernetes workloads.

Subscribe: