Most people who use Kubernetes know that you can scale applications using Horizontal Pod Autoscaler (HPA) based on their CPU or memory usage. There are however many more features of HPA that you can use to customize scaling behaviour of your application, such as scaling using custom application metrics or external metrics, as well as alpha/beta features like "scaling to zero" or container metrics scaling.
So, in this article we will explore all of these options so that we can take full advantage of all available features of HPA and to get a head start on the features that are coming in future Kubernetes releases.
Setup
Before we get started with scaling, we first need a testing environment. For that we will use KinD (Kubernetes in Docker) cluster defined by the following YAML:
# cluster.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
HPAScaleToZero: true
HPAContainerMetrics: true
LogarithmicScaleDown: true
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
This manifest configures the KinD cluster with 1 control plane node and 3 workers, additionally it enables a couple of feature gates related to autoscaling. These feature gates will later allow us to use some alpha/beta features of HPA. To create a cluster with the above configuration, you can run:
kind create cluster --config ./cluster.yaml --name autoscaling --image=kindest/node:v1.23.6
Apart from the cluster, we will also need an application that we will scale. For that we will use resource consumer tool and it's image, which are used in Kubernetes end-to-end testing. To deploy it, you can run:
kubectl create deployment resource-consumer --image=gcr.io/k8s-staging-e2e-test-images/resource-consumer:1.11
kubectl set resources deployment resource-consumer --requests=cpu=500m,memory=256Mi
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
labels:
app: resource-consumer
name: resource-consumer
namespace: default
spec:
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: resource-consumer
EOF
This application is very handy in this situation, as it allows us to simulate CPU and memory consumption of a Pod. It can also expose custom metrics which are needed for scaling based on custom/external metrics. To test this out we can run:
# Consume CPU (300m for 10min):
kubectl run curl --image=curlimages/curl:7.83.1 \
--rm -it --restart=Never -- \
curl --data "millicores=300&durationSec=600" http://resource-consumer:8080/ConsumeCPU
# Expose metric "custom_metric" with value 100 for 10min at endpoint /metrics
kubectl run curl --image=curlimages/curl:7.83.1 --rm \
-it --restart=Never -- \
curl --data "metric=custom_metric&delta=100&durationSec=600" http://resource-consumer:8080/BumpMetric
Next, we will also need to deploy services that collect metrics based on which we will later scale our test application. First of these is Kubernetes metrics-server
which is usually available in cluster by default, but that's not the case in KinD, so to deploy it we need to run:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
kubectl patch -n kube-system deployment metrics-server --type=json \
-p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
metrics-server
allows us to monitor for basic metrics such as CPU and memory usage, but we also want to implement scaling based on custom metrics, such as the ones exposed by an application on its /metrics
endpoint, or even external ones like queue depth of a queue running outside of cluster. For these we will need:
- Prometheus Operator to gather the custom/external metrics.
- ServiceMonitor object(s) to tell Prometheus how to scrape our application's metrics.
- Prometheus adapter to get custom/external metrics from Prometheus instance into Kubernetes API.
You can refer to the end-to-end walkthrough for more details of the setup.
The above requires a lot of setup, so for purpose of this article and for your convenience, I've made a script and a set manifests that you can use to spin up KinD cluster along with all the required components. All you need to do is run setup.sh
script from this repository.
After running the script, we can verify that everything is ready using following commands:
# To verify availability of metrics run:
kubectl top nodes
# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
# autoscaling-control-plane 113m 0% 1024Mi 1%
# autoscaling-worker 49m 0% 385Mi 0%
# autoscaling-worker2 42m 0% 381Mi 0%
# autoscaling-worker3 37m 0% 276Mi 0%
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq . # also works with "pods"
# ...
# {
# "metadata": {
# "name": "autoscaling-worker3",
# "labels": { ... }
# },
# "window": "20s",
# "usage": {
# "cpu": "43077193n",
# "memory": "283212Ki"
# }
# To query/verify custom metrics:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . # also works for "external" instead of "custom"
# ...
# "name": "pods/custom_metric",
# "singularName": "",
# "namespaced": true,
# "kind": "MetricValueList",
# "verbs": [ "get" ]
# ...
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq .
# ...
# {
# "kind": "MetricValueList",
# "apiVersion": "custom.metrics.k8s.io/v1beta1",
# "metadata": {"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/custom_metric"},
# "items": [{
# "describedObject": {
# "kind": "Pod",
# "namespace": "default",
# "name": "resource-consumer-6bf5898d6f-gzzgm",
# "apiVersion": "/v1"
# },
# "metricName": "custom_metric", "value": "100",
# }]}
More helpful commands can be found in output of above mentioned script or in the repository README.
Basic Autoscaling
Now that we have our infrastructure up-and-running, we can start scaling the test application. The simplest way to do so is to create HPA using command like kubectl autoscale deploy resource-consumer --min=1 --max=5 --cpu-percent=75
, this however creates HPA with apiVersion
of autoscaling/v1
, which lacks most of the features.
So, instead, we will create the HPA with YAML, specifying autoscaling/v2
as a apiVersion
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: resource-consumer-v2
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: resource-consumer
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 200Mi
The above HPA will use basic metrics gathered from application Pod(s) by metrics-server
. To test out the scaling we can simulate heavy memory usage:
kubectl run curl --image=curlimages/curl:7.83.1 \
--rm -it --restart=Never -- \
curl --data "megabytes=500&durationSec=600" http://resource-consumer:8080/ConsumeMem
kubectl get hpa -w
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# resource-consumer-v2 Deployment/resource-consumer 4689920/200Mi, 0%/75% 1 5 1 81s
# resource-consumer-v2 Deployment/resource-consumer 530415616/200Mi, 0%/75% 1 5 1 2m23s
# resource-consumer-v2 Deployment/resource-consumer 265820160/200Mi, 0%/75% 1 5 3 2m31s
# resource-consumer-v2 Deployment/resource-consumer 212226867200m/200Mi, 0%/75% 1 5 5 5m50s
Custom Metrics
Scaling based on CPU and memory usage is often enough, but we're after the advanced scaling options. First of them is scaling using custom metrics exposed by an application:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: resource-consumer-v2-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: resource-consumer
minReplicas: 1
maxReplicas: 5
metrics:
# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq .
- type: Pods
pods:
metric:
name: custom_metric
target:
type: AverageValue
averageValue: 100
This HPA is configured to scale the application based on the value of custom_metric
that was scraped by Prometheus from application's /metrics
endpoint. This will scale the application up if average value of specified metric across all pods (.target.type: AverageValue
) goes over 100.
The above uses Pod metric to scale, but it's possible to specify any other object which has a metric attached to itself:
# ...
- type: Object
object:
metric:
name: custom_metric
describedObject:
apiVersion: v1
kind: Service
name: resource-consumer
target:
type: Value
value: 100
This snippet achieves the same as the previous one, this time however, using Service instead of Pod as the source of the metric. It also shows that you can use direct comparison to measure the scaling threshold by setting .target.type
to Value
instead of AverageValue
.
To figure out which objects expose metrics that you can use in scaling, you can traverse the API using kubectl get --raw
. For example to look up the custom_metric
for either Pod or Service you can use:
# Pod
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq .
# Service
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/resource-consumer/custom_metric" | jq .
# Everything
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/" | jq .
Also, to help you troubleshoot, the HPA object provides a status stanza, that shows whether the applied metric was recognized:
kubectl get hpa resource-consumer-v2-custom -o json | jq .status.conditions
[
...
{
"lastTransitionTime": "2022-05-17T12:36:03Z",
"message": "the HPA was able to successfully calculate a replica count from pods metric custom_metric",
"reason": "ValidMetricFound",
"status": "True",
"type": "ScalingActive"
},
...
]
Finally, to test out the behavior of the above HPA, we can bump the metric exposed by the application and see how the application scales up:
# Raise custom_metric to 150
kubectl run curl --image=curlimages/curl:7.83.1 \
--rm -it --restart=Never -- curl \
--data "metric=custom_metric&delta=150&durationSec=600" http://resource-consumer:8080/BumpMetric
kubectl get hpa -w
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# resource-consumer-v2-custom Deployment/resource-consumer 0/100 1 5 1 10s
# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 1 24s
# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 2 40s
# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 5 75s
External Metrics
To show full potential of HPA, we will also try scaling an application based on external metric. This would require us to scrape metrics from external system running outside of a cluster, such Kafka or PostgreSQL. We don't have that available, so instead we've configured Prometheus Adapter to treat certain metrics as external. The configuration that does this can be found [here](https://github.com/MartinHeinz/metrics-on-kind/blob/master/custom-metrics-config-map.yaml). All you need to know though is that with this test cluster, any application metrics prefixed with external
will go to external metrics API. To test this out, we bump up such a metric and check if the API gets populated:
# Set external_queue_messages_ready to 150 for 10min
kubectl run curl --image=curlimages/curl:7.83.1 \
--rm -it --restart=Never -- \
curl --data "metric=external_queue_messages_ready&delta=150&durationSec=600" \
http://resource-consumer:8080/BumpMetric
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/default/external_queue_messages_ready | jq .
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"items": [
{
"metricName": "external_queue_messages_ready",
"value": "150"
}
]
}
To then scale our deployment based on this metric we can use following HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: resource-consumer-v2-external
spec:
# ...
metrics:
- type: External
external:
metric:
name: external_queue_messages_ready
target:
type: Value
value: 100
HPAScaleToZero
Now that we've gone through all the well known features of HPA, let's also take a look at the alpha/beta ones that we enabled using feature gates. First one being HPAScaleToZero.
As the name suggests, this will allow you to set minReplicas
in HPA to zero, effectively turning the service off if there's no traffic. This can be useful in "bursty" workflow, for example in case where your application receives data from an external queue. In this use case the application can be safely scaled to zero when there are messages waiting to be processed.
With the feature gate enabled we can simply run:
kubectl patch hpa resource-consumer-v2-external -p '{"spec":{"minReplicas": 0}}'
Which sets the minimum replicas of previously shown HPA to zero.
Be aware though, that this will only work for metrics of type
External
or Object
.
HPAContainerMetrics
Another feature gate that we can make use of is HPAContainerMetrics which allows us to use metrics of type: ContainerResource
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: resource-consumer-v2-container
spec:
# ...
metrics:
- type: ContainerResource
containerResource:
name: cpu
container: resource-consumer
target:
type: Utilization
averageUtilization: 75
This makes it possible to scale based on resource utilization of individual containers rather than whole Pod. This can be useful if you have multi-container Pod with application container and sidecar, and you want to ignore the sidecar and scale the deployment only based on the application container.
You can also view the breakdown of Pod/container metrics by running the following command:
POD=$(kubectl get pod -l app=resource-consumer -o jsonpath="{.items[0].metadata.name}")
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/$POD" | jq .
{
"kind": "PodMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "resource-consumer-6bf5898d6f-gzzgm",
"namespace": "default",
},
"window": "16s",
"containers": [{
"name": "resource-consumer",
"usage": {
"cpu": "0",
"memory": "11028Ki"
}}]}
LogarithmicScaleDown
Last but not least is LogarithmicScaleDown feature flag.
Without this feature, the Pod that's been running for least amount of time gets deleted first during downscaling. That's not always ideal though as it can create imbalance in replica distribution because newer Pods tend serve less traffic than the older ones.
With this feature flag enabled, a semi-random selection of Pods will be used instead when selecting Pod to be deleted.
For a full rationale and algorithm details see KEP-2189.
Closing Thoughts
In this article, I tried to cover most of the things you can do with Kubernetes HPA to scale your application. There are however, many more tools and options for scaling applications running in Kubernetes, such as vertical pod autoscaler which can help to keep Pod resource requests and limits up-to-date.
Another option would be predictive HPA by Digital Ocean, which will try to predict how many replicas a resource should and application have.
Finally, autoscaling doesn't end with Pods - next step after setting up Pod autoscaling is to also set up cluster autoscaling to avoid running out of available resources in you whole cluster.