Setting Up Horizontal Pod Autoscaler with Custom Metrics
Scale your Kubernetes workloads on business metrics — RPS, queue depth, or latency — instead of just CPU. This tutorial wires up KEDA and Prometheus Adapter to drive HPA from real application signals.
Before you begin
- kubectl configured against a running cluster
- Helm 3 installed
- Basic understanding of Kubernetes Deployments and Services
CPU-based autoscaling is a blunt instrument. Your application might be saturating a database connection pool at 20% CPU, or sitting idle at 80% while a downstream queue builds up. Custom metrics let HPA respond to signals that actually matter for your workload.
This tutorial uses KEDA — the Kubernetes Event-Driven Autoscaler — which is the cleanest way to get custom metrics into HPA without wrestling with the Prometheus Adapter's configuration format.
What You'll Build
An HPA that scales a worker deployment based on the number of pending jobs in a Redis list. When the queue is empty, the deployment scales to zero. When jobs arrive, it scales up proportionally.
The same pattern applies to: HTTP request rate, Kafka consumer lag, RabbitMQ queue depth, or any Prometheus metric.
Step 1: Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda \
--create-namespace \
--version 2.13.0Verify the KEDA operator is running:
kubectl get pods -n keda
# NAME READY STATUS RESTARTS
# keda-operator-xxxx 1/1 Running 0
# keda-operator-metrics-apiserver-xxxx 1/1 Running 0KEDA installs two components: the operator (watches ScaledObjects) and the metrics API server (exposes custom metrics to the HPA controller).
Step 2: Deploy the Sample Worker
1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: job-worker
6 namespace: default
7spec:
8 replicas: 1
9 selector:
10 matchLabels:
11 app: job-worker
12 template:
13 metadata:
14 labels:
15 app: job-worker
16 spec:
17 containers:
18 - name: worker
19 image: busybox
20 command: ["sh", "-c", "while true; do sleep 5; done"]
21 resources:
22 requests:
23 cpu: "100m"
24 memory: "64Mi"
25 limits:
26 cpu: "200m"
27 memory: "128Mi"
28EOFStep 3: Deploy Redis
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
--namespace default \
--set auth.enabled=false \
--set replica.replicaCount=0Get the Redis connection string:
kubectl get svc redis-master -n default
# redis-master.default.svc.cluster.local:6379Step 4: Create the ScaledObject
A ScaledObject tells KEDA what metric to watch and how to translate it into a replica count:
1kubectl apply -f - <<EOF
2apiVersion: keda.sh/v1alpha1
3kind: ScaledObject
4metadata:
5 name: job-worker-scaler
6 namespace: default
7spec:
8 scaleTargetRef:
9 name: job-worker
10 minReplicaCount: 0 # Scale to zero when queue is empty
11 maxReplicaCount: 20
12 pollingInterval: 15 # Check every 15 seconds
13 cooldownPeriod: 60 # Wait 60s before scaling down
14 triggers:
15 - type: redis
16 metadata:
17 address: redis-master.default.svc.cluster.local:6379
18 listName: job-queue
19 listLength: "5" # Target: 5 jobs per replica
20EOFlistLength: "5" means KEDA targets 5 pending jobs per replica. With 50 jobs in the queue, it scales to 10 replicas. With 0 jobs, it scales to 0.
Step 5: Verify the ScaledObject
1kubectl get scaledobject job-worker-scaler
2# NAME SCALETARGETKIND SCALETARGETNAME MIN MAX READY
3# job-worker-scaler Deployment job-worker 0 20 True
4
5kubectl get hpa
6# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
7# keda-hpa-job-worker-scaler Deployment/job-worker 0/5 (avg) 1 20 1KEDA creates and manages the HPA object automatically. You never touch the HPA directly.
Step 6: Test the Autoscaling
Push jobs into the Redis queue:
1# Exec into a temp pod with redis-cli
2kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
3 -h redis-master.default.svc.cluster.local
4
5# Inside the pod:
6RPUSH job-queue job1 job2 job3 job4 job5 job6 job7 job8 job9 job10
7LLEN job-queue
8# (integer) 10Watch the deployment scale up:
kubectl get hpa -w
# TARGETS REPLICAS
# 0/5 (avg) 0
# 10/5 (avg) 2 ← scaling up
# 10/5 (avg) 2Clear the queue and watch it scale back to zero (after the cooldown period):
kubectl run redis-cli --rm -it --image=redis:7 -- redis-cli \
-h redis-master.default.svc.cluster.local DEL job-queueStep 7: Scale on a Prometheus Metric Instead
For HTTP request rate or any Prometheus metric, swap the trigger:
1triggers:
2 - type: prometheus
3 metadata:
4 serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
5 metricName: http_requests_total
6 query: |
7 sum(rate(http_requests_total{job="api-server"}[2m]))
8 threshold: "100" # Scale up when RPS > 100 per replicaThis scales based on the Prometheus query result. At 500 RPS with threshold 100, KEDA targets 5 replicas.
Common Issues
ScaledObject stuck in READY: False — check KEDA operator logs:
kubectl logs -n keda -l app=keda-operator --tail=50HPA not updating replicas — verify the metrics API is registered:
kubectl get apiservice v1beta1.external.metrics.k8s.io
# Should show AVAILABLE: TrueScale-to-zero not working — minReplicaCount: 0 requires the workload to tolerate cold starts. If your app takes 30+ seconds to boot, increase cooldownPeriod and consider keeping minReplicaCount: 1 for latency-sensitive paths.
Cleanup
kubectl delete scaledobject job-worker-scaler
kubectl delete deployment job-worker
helm uninstall redis
helm uninstall keda -n kedaOfficial References
- Horizontal Pod Autoscaling — Official HPA docs covering CPU/memory scaling, custom metrics, and the scaling algorithm
- HPA Walkthrough with Custom Metrics — Step-by-step tutorial from the Kubernetes docs team
- Kubernetes Metrics Server — The in-cluster metrics aggregator required for CPU/memory-based HPA
- KEDA Documentation — Event-driven autoscaling that extends HPA with 50+ scalers including Prometheus, Kafka, and SQS
- Prometheus Adapter — Exposes Prometheus metrics to the Kubernetes custom metrics API for HPA
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.