etcd is the brain of every Kubernetes cluster, the key-value storage keeping track of all the objects in a cluster. It's intertwined and tightly coupled with Kubernetes, and it might seem like an inseparable part of a cluster, or is it?
In this article we will explore how we could replace etcd with PostgreSQL database, as well as why and when it might make sense to do so.
Why?
If you're running your own Kubernetes cluster, then you know the pains of managing etcd. Apart the user perspective, etcd has little usage outside of Kubernetes and has been in state of decline because no one wants to maintain it. Due to this, critical bugs take a long time to fix.
Besides, why not? Why shouldn't we use other storage backends with Kubernetes? Having more options is a good thing and there really are no downsides to running Kubernetes with RDBMS, whether it's PostgreSQL, MySQL or anything else you might be comfortable with.
Also, running Kubernetes with RDBMS isn't such a novel idea either, k3s, a production-grade Kubernetes distribution can run with relational DB instead of etcd. If it works for k3s, why wouldn't it work for any other cluster?
How?
As was mentioned in the beginning, Kubernetes and etcd are tightly coupled. Cluster components (API server) expect an etcd-like interface to which they can write and from which they can read. Therefore, to use SQL database as a storage, we need to provide an etcd-to-SQL translation layer, which is called Kine. Kine is the component of k3s that allows it to use various RDBMS as an etcd replacement. It provides an implementation of the GRPC functions that Kubernetes relies upon. As far as Kubernetes is concerned, it is talking to an etcd server.
First things first though, before we get to running Kine, if we want to run Kubernetes with PostgreSQL, we will obviously need an instance of PostgreSQL:
Note: If you want to follow along or quickly spin up a cluster in VM backed by PostgreSQL, then you can check out my repository (k8s-without-etcd
branch).
apt -y install postgresql postgresql-contrib
systemctl start postgresql.service
In this tutorial we will use PostgreSQL running as a systemd service, we will also set up SSL for the database:
# Generate self signed root CA cert
openssl req -addext "subjectAltName = DNS:localhost" -nodes \
-x509 -newkey rsa:2048 -keyout ca.key -out ca.crt -subj "/CN=localhost"
# Generate server cert to be signed
openssl req -addext "subjectAltName = DNS:localhost" -nodes \
-newkey rsa:2048 -keyout server.key -out server.csr -subj "/CN=localhost"
# Sign the server cert
openssl x509 -extfile <(printf "subjectAltName=DNS:localhost") -req \
-in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt
chmod og-rwx ca.key
chmod og-rwx server.key
cp {server.crt,server.key,ca.crt} /var/lib/postgresql/
chown postgres.postgres /var/lib/postgresql/server.key
sed -i -e "s|ssl_cert_file.*|ssl_cert_file = '/var/lib/postgresql/server.crt'|g" /etc/postgresql/14/main/postgresql.conf
sed -i -e "s|ssl_key_file.*|ssl_key_file = '/var/lib/postgresql/server.key'|g" /etc/postgresql/14/main/postgresql.conf
sed -i -e "s|#ssl_ca_file.*|ssl_ca_file = '/var/lib/postgresql/ca.crt'|g" /etc/postgresql/14/main/postgresql.conf
systemctl restart postgresql.service
We use openssl
to generate root CA certificate, signing request and the server certificate and key. We then limit permissions of the keys, so that only their owner can read them. We copy both the certificates and key to /var/lib/postgresql/
and make postgres
user an owner of the server key. Finally, we modify the postgresql.conf
to tell PostgreSQL to use our new SSL certs and key.
With the database running, we can move over to creating a cluster. We will do so using kubeadm
and below configuration:
# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.56.2
bindPort: 6443
nodeRegistration:
criSocket: "unix:///var/run/crio/crio.sock"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.25.0
networking:
podSubnet: 10.244.0.0/16
etcd:
external:
endpoints:
- http://127.0.0.1:2379
apiServer:
timeoutForControlPlane: 1m0s
You can customize this to your needs. The only important part here is etcd.external.endpoints
array that tells Kubernetes where the etcd (or etcd-compatible-interface) is - in our case, this is where Kine will listen.
To build a cluster using this configuration, run:
kubeadm init --config=/.../kubeadm-config.yaml --upload-certs --ignore-preflight-errors ExternalEtcdVersion 2>&1 || true
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
In addition to passing in the configuration file, we also specify that we want to ignore etcd related error during startup and considering that this playground cluster has only one node, we also un-taint the master node so that we can run workloads on it.
Now comes the fun part - setting up Kine. There are couple ways we could deploy it - as a basic process, as a systemd service, as a Pod in kube-system
namespace, or my preferred option - as an API server sidecar.
When we deployed the cluster with kubeadm
, it already generated the static Pod manifests for us, including one for API server (kube-apiserver.yaml
), therefore, we will need to patch it to include a container running Kine:
# vim /etc/kubernetes/manifests/kube-apiserver.yaml
# ...
containers:
- name: kube-apiserver
# ... Existing API server container
# -------------------------------------
- image: rancher/kine:v0.10.1-amd64
name: kine
securityContext: # Don't do this in real deployment...
runAsUser: 0
runAsGroup: 0
command: [ "/bin/sh", "-c", "--" ]
args: [ 'kine --endpoint="postgres://$(POSTGRES_USERNAME):$(POSTGRES_PASSWORD)@localhost:5432/postgres"
--ca-file=/var/lib/postgresql/ca.crt
--cert-file=/var/lib/postgresql/server.crt
--key-file=/var/lib/postgresql/server.key' ]
env:
- name: POSTGRES_USERNAME # This should be a secret
value: "postgres"
- name: POSTGRES_PASSWORD # This should be a secret
value: "somepass"
volumeMounts:
- mountPath: /var/lib/postgresql/
name: kine-ssl
readOnly: true
volumes:
# -------------------------------------
# ... Existing volumes used by API Server container
# -------------------------------------
- hostPath:
path: /var/lib/postgresql
type: DirectoryOrCreate
name: kine-ssl
Above is the container that we need to add to the API server static Pod, it uses Kine image from Docker Hub at rancher/kine:...
. It specifies entrypoint that points it to the PostgreSQL database, as well as the SSL certificates and key we generated earlier.
After applying these changes to the API server, we will see that Kubelet will successfully start both kube-apiserver
and kine
container:
crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
09172381b42f1 4d2edfd10d3e3f... 29 seconds ago Running kube-apiserver 0 3187286722eba kube-apiserver-kubemaster
55ac5108ae677 ccdd8a15f4ca3e... 29 seconds ago Running kine 0 3187286722eba kube-apiserver-kubemaster
And just to prove that we are up-and-running without etcd, we can check kube-system
namespace and see that there are no etcd
pods:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-controller-manager-kubemaster 1/1 Running 0 5d19h
kube-scheduler-kubemaster 1/1 Running 0 5d19h
kube-apiserver-kubemaster 1/1 Running 0 5d19h
kube-proxy-mrfs5 1/1 Running 0 5d19h
coredns-565d847f94-wrn82 1/1 Running 0 5d19h
coredns-565d847f94-7zvwr 1/1 Running 0 5d19h
Or as an extra test, we can deploy some resources to the cluster, for example these ones.
kubectl get pods -n default
NAME READY STATUS RESTARTS AGE
example-deployment-78d75878cc-b56kl 1/1 Running 0 22h
example-deployment-78d75878cc-4ftj5 1/1 Running 0 22h
example-deployment-78d75878cc-bvpx6 1/1 Running 0 22h
kubectl get all -n default
NAME READY STATUS RESTARTS AGE
pod/example-deployment-78d75878cc-b56kl 1/1 Running 0 22h
pod/example-deployment-78d75878cc-4ftj5 1/1 Running 0 22h
pod/example-deployment-78d75878cc-bvpx6 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d17h
service/example-service ClusterIP 10.98.111.70 <none> 80/TCP 22h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/example-deployment 3/3 3 3 22h
NAME DESIRED CURRENT READY AGE
replicaset.apps/example-deployment-78d75878cc 3 3 3 22h
And hooray, we can see everything running, which wouldn't be possible without a working backing storage.
Exploring/Poking Around
With complete cluster running, we can poke around and explore the PostgreSQL database, first we log in:
psql -U postgres -p 5432 -h 127.0.0.1 # somepass
And then we can view the schema and tables:
-- List tables:
\dt public.*
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | kine | table | postgres
(1 row)
-- Describe Kine table
\d public.kine;
Table "public.kine"
Column | Type | Collation | Nullable | Default
-----------------+------------------------+-----------+----------+----------------------------------
id | integer | | not null | nextval('kine_id_seq'::regclass)
name | character varying(630) | | |
created | integer | | |
deleted | integer | | |
create_revision | integer | | |
prev_revision | integer | | |
lease | integer | | |
value | bytea | | |
old_value | bytea | | |
Indexes:
"kine_pkey" PRIMARY KEY, btree (id)
"kine_id_deleted_index" btree (id, deleted)
"kine_name_id_index" btree (name, id)
"kine_name_index" btree (name)
"kine_name_prev_revision_uindex" UNIQUE, btree (name, prev_revision)
"kine_prev_revision_index" btree (prev_revision)
As you can see above, there's a single table named kine
which holds all the data. Kine uses the database as log-structured storage, so every write from API server creates a new row that stores the created or updated Kubernetes object.
Let's take a look at the data:
-- Some 1000+ rows
select count(*) from public.kine;
count
-------
1319
(1 row)
select name, encode(public.kine.value, 'escape') as value_plain
from public.kine where name like '/registry/pods/default/example%' limit 1;
select name, encode(public.kine.value, 'escape') as value_plain
from public.kine where name like '/registry/configmaps/default/example%' limit 1;
The name
column uses the same structure as etcd - it specifies the path to the object in cluster - /registry/RESOURCE_TYPE/NAMESPACE/NAME
. The value
column holds the actual manifest as a byte array. Here, I omit the actual result of the query because the decoded data is quite ugly because of whitespace decoding and presence of "managed fields" data, but if you try it yourself, you will be able to decipher it.
Scaling/Performance
We figured out how to run Kubernetes without etcd, but should you? How is the performance of such a cluster and does it scale?
As already mentioned, Kine and therefore also RDBMS backends are used by k3s, so we can check k3s resource profiling docs to compare how SQL databases perform in comparison to etcd.
k3s also has a test suite with customizable database engine/backend, so you could run the tests and compare if you really wanted to.
For scaling and PostgreSQL in particular, you could also follow the advice in this GitHub issue and store different data types in different tables using PostgreSQL's partitioned tables.
Closing Thoughts
I believe that there are no downsides to using RDBMS instead etcd for a Kubernetes cluster, but getting rid of etcd won't solve all your issues. Every tool brings its own set problems and challenges and same applies to PostgreSQL or any other SQL database.
You'd have to weigh the pros and cons of running your cluster with SQL database for your particular use case, maybe for example the familiar SQL interface and your expertise in managing RDBMS outweighs the overhead, hassle, or any possible issues that might come with replacing etcd.
...or, maybe just use k3s 😉.