CPU-based autoscaling is a blunt instrument. Your application might be saturating a database connection pool at 20% CPU, or sitting idle at 80% while a downstream queue builds up. Custom metrics let HPA respond to signals that actually matter for your workload.

This tutorial uses KEDA — the Kubernetes Event-Driven Autoscaler — which is the cleanest way to get custom metrics into HPA without wrestling with the Prometheus Adapter's configuration format.

What You'll Build

An HPA that scales a worker deployment based on the number of pending jobs in a Redis list. When the queue is empty, the deployment scales to zero. When jobs arrive, it scales up proportionally.

The same pattern applies to: HTTP request rate, Kafka consumer lag, RabbitMQ queue depth, or any Prometheus metric.

Step 1: Install KEDA

bash

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.13.0

Verify the KEDA operator is running:

bash

kubectl get pods -n keda
# NAME                                      READY   STATUS    RESTARTS
# keda-operator-xxxx                        1/1     Running   0
# keda-operator-metrics-apiserver-xxxx      1/1     Running   0

KEDA installs two components: the operator (watches ScaledObjects) and the metrics API server (exposes custom metrics to the HPA controller).

Step 2: Deploy the Sample Worker

bash

1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: job-worker
6  namespace: default
7spec:
8  replicas: 1
9  selector:
10    matchLabels:
11      app: job-worker
12  template:
13    metadata:
14      labels:
15        app: job-worker
16    spec:
17      containers:
18        - name: worker
19          image: busybox
20          command: ["sh", "-c", "while true; do sleep 5; done"]
21          resources:
22            requests:
23              cpu: "100m"
24              memory: "64Mi"
25            limits:
26              cpu: "200m"
27              memory: "128Mi"
28EOF

Step 3: Deploy Redis

bash

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
  --namespace default \
  --set auth.enabled=false \
  --set replica.replicaCount=0

Get the Redis connection string:

bash

kubectl get svc redis-master -n default
# redis-master.default.svc.cluster.local:6379

Step 4: Create the ScaledObject

A ScaledObject tells KEDA what metric to watch and how to translate it into a replica count:

bash

1kubectl apply -f - <<EOF
2apiVersion: keda.sh/v1alpha1
3kind: ScaledObject
4metadata:
5  name: job-worker-scaler
6  namespace: default
7spec:
8  scaleTargetRef:
9    name: job-worker
10  minReplicaCount: 0        # Scale to zero when queue is empty
11  maxReplicaCount: 20
12  pollingInterval: 15       # Check every 15 seconds
13  cooldownPeriod: 60        # Wait 60s before scaling down
14  triggers:
15    - type: redis
16      metadata:
17        address: redis-master.default.svc.cluster.local:6379
18        listName: job-queue
19        listLength: "5"     # Target: 5 jobs per replica
20EOF

listLength: "5" means KEDA targets 5 pending jobs per replica. With 50 jobs in the queue, it scales to 10 replicas. With 0 jobs, it scales to 0.

Step 5: Verify the ScaledObject

bash

1kubectl get scaledobject job-worker-scaler
2# NAME                 SCALETARGETKIND   SCALETARGETNAME   MIN   MAX   READY
3# job-worker-scaler    Deployment        job-worker        0     20    True
4
5kubectl get hpa
6# NAME                          REFERENCE          TARGETS             MINPODS   MAXPODS   REPLICAS
7# keda-hpa-job-worker-scaler    Deployment/job-worker   0/5 (avg)     1         20        1

KEDA creates and manages the HPA object automatically. You never touch the HPA directly.

Step 6: Test the Autoscaling

Push jobs into the Redis queue:

bash

1# Exec into a temp pod with redis-cli
2kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
3  -h redis-master.default.svc.cluster.local
4
5# Inside the pod:
6RPUSH job-queue job1 job2 job3 job4 job5 job6 job7 job8 job9 job10
7LLEN job-queue
8# (integer) 10

Watch the deployment scale up:

bash

kubectl get hpa -w
# TARGETS      REPLICAS
# 0/5 (avg)    0
# 10/5 (avg)   2     ← scaling up
# 10/5 (avg)   2

Clear the queue and watch it scale back to zero (after the cooldown period):

bash

kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
  -h redis-master.default.svc.cluster.local DEL job-queue

Step 7: Scale on a Prometheus Metric Instead

For HTTP request rate or any Prometheus metric, swap the trigger:

yaml

1triggers:
2  - type: prometheus
3    metadata:
4      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
5      metricName: http_requests_total
6      query: |
7        sum(rate(http_requests_total{job="api-server"}[2m]))
8      threshold: "100"    # Scale up when RPS > 100 per replica

This scales based on the Prometheus query result. At 500 RPS with threshold 100, KEDA targets 5 replicas.

Common Issues

ScaledObject stuck in READY: False — check KEDA operator logs:

bash

kubectl logs -n keda -l app=keda-operator --tail=50

HPA not updating replicas — verify the metrics API is registered:

bash

kubectl get apiservice v1beta1.external.metrics.k8s.io
# Should show AVAILABLE: True

Scale-to-zero not working — minReplicaCount: 0 requires the workload to tolerate cold starts. If your app takes 30+ seconds to boot, increase cooldownPeriod and consider keeping minReplicaCount: 1 for latency-sensitive paths.

Cleanup

bash

kubectl delete scaledobject job-worker-scaler
kubectl delete deployment job-worker
helm uninstall redis
helm uninstall keda -n keda

Official References

Horizontal Pod Autoscaling — Official HPA docs covering CPU/memory scaling, custom metrics, and the scaling algorithm
HPA Walkthrough with Custom Metrics — Step-by-step tutorial from the Kubernetes docs team
Kubernetes Metrics Server — The in-cluster metrics aggregator required for CPU/memory-based HPA
KEDA Documentation — Event-driven autoscaling that extends HPA with 50+ scalers including Prometheus, Kafka, and SQS
Prometheus Adapter — Exposes Prometheus metrics to the Kubernetes custom metrics API for HPA

Setting Up Horizontal Pod Autoscaler with Custom Metrics

Before you begin

What You'll Build

Step 1: Install KEDA

Step 2: Deploy the Sample Worker

Step 3: Deploy Redis

Step 4: Create the ScaledObject

Step 5: Verify the ScaledObject

Step 6: Test the Autoscaling

Step 7: Scale on a Prometheus Metric Instead

Common Issues

Cleanup

Official References

Struggling with this in production?