KEDA Event-Driven Autoscaling: SQS, Kafka, Redis Scalers with Real YAML

HPA — the Horizontal Pod Autoscaler — does one thing well: scale based on CPU and memory. For web servers where CPU correlates with request rate, that's often enough. But for async processing systems — workers draining SQS queues, Kafka consumers, batch jobs processing Redis streams — CPU is a terrible proxy for load. A worker sitting idle waiting for messages uses almost no CPU. HPA won't scale it up, even if there are 50,000 messages in the queue.

KEDA (Kubernetes Event-Driven Autoscaler) solves this. It extends HPA to support 60+ event sources as scaling triggers. More importantly, it supports true scale-to-zero — your worker fleet can drop to zero replicas when there's nothing to process.

Why HPA Alone Is Insufficient for Async Workloads

Consider an SQS worker. It polls for messages, processes them, deletes them. When the queue is empty, it loops and polls again — nearly zero CPU. When the queue has 10,000 messages, it might be at 40% CPU. HPA with a target of 50% CPU would not scale up until CPU exceeds 50%. By that time, message lag is already significant.

The fundamental problem: CPU doesn't correlate with queue depth for polling-based workers. You need to scale on queue depth directly.

The same applies to:

Kafka consumers: scale on consumer group lag
Redis workers: scale on list/stream length
Prometheus metrics: scale on any queryable metric
HTTP requests: scale on request rate or pending connections
Cron: scale to a specific replica count on schedule

KEDA handles all of these through a unified ScaledObject/ScaledJob API.

How KEDA Works

KEDA deploys a metrics adapter that acts as a bridge between external event sources and the Kubernetes metrics API. When you create a ScaledObject, KEDA:

Creates or takes ownership of an HPA for the target workload
Registers a metrics adapter that polls the external scaler
Feeds external metrics into the HPA so it can make scaling decisions

The architecture means KEDA is additive — it works with existing HPA, not against it. You can have a ScaledObject managing scale-to-zero based on SQS depth, and still use the underlying HPA for CPU-based scaling as a floor.

Installation

bash

1helm repo add kedacore https://kedacore.github.io/charts
2helm repo update
3
4helm install keda kedacore/keda \
5  --namespace keda \
6  --create-namespace \
7  --set podIdentity.aws.irsa.enabled=true \
8  --set podIdentity.aws.irsa.stsRegionalEndpoints=true

The irsa flags are important for EKS — they configure KEDA to use IAM Roles for Service Accounts rather than static credentials when accessing AWS services.

ScaledObject: Scaling Deployments and StatefulSets

ScaledObject is the primary resource for scaling long-running workloads. It wraps a Deployment, StatefulSet, or any resource that implements the scale subresource.

SQS Queue Depth Scaler

The most common use case I've deployed: scale an SQS worker based on queue depth.

yaml

1apiVersion: keda.sh/v1alpha1
2kind: ScaledObject
3metadata:
4  name: sqs-worker-scaler
5  namespace: processing
6spec:
7  scaleTargetRef:
8    name: sqs-worker
9  pollingInterval: 15      # Check queue depth every 15 seconds
10  cooldownPeriod: 60       # Wait 60s after last trigger before scaling to zero
11  idleReplicaCount: 0      # Scale to zero when idle
12  minReplicaCount: 0       # Allow zero replicas
13  maxReplicaCount: 50
14  triggers:
15  - type: aws-sqs-queue
16    authenticationRef:
17      name: keda-aws-credentials
18    metadata:
19      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
20      queueLength: "10"     # Target: 10 messages per replica
21      awsRegion: us-east-1
22      identityOwner: operator

With queueLength: "10", KEDA will aim for 10 messages per replica. If the queue has 100 messages, it scales to 10 replicas. 500 messages → 50 replicas (capped at maxReplicaCount). 0 messages → scale to 0 after cooldownPeriod seconds.

The IAM permissions needed for the KEDA service account:

json

1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": [
7        "sqs:GetQueueAttributes",
8        "sqs:GetQueueUrl"
9      ],
10      "Resource": "arn:aws:sqs:us-east-1:123456789:my-queue"
11    }
12  ]
13}

TriggerAuthentication for IRSA

Rather than embedding credentials in the ScaledObject, use TriggerAuthentication with IRSA:

yaml

1apiVersion: keda.sh/v1alpha1
2kind: TriggerAuthentication
3metadata:
4  name: keda-aws-credentials
5  namespace: processing
6spec:
7  podIdentity:
8    provider: aws-eks  # Use IRSA

Then annotate the KEDA service account with the IAM role:

bash

kubectl annotate serviceaccount keda-operator \
  -n keda \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789:role/keda-operator-role

ScaledJob: Scaling Batch Workloads

ScaledJob is different from ScaledObject. Instead of scaling replicas of a Deployment, it creates new Kubernetes Jobs — one Job per trigger event (or batch of events). This is the right pattern for work that needs isolation, has a finite duration, or can't be parallelized within a single pod.

Per-Message Job Processing

yaml

1apiVersion: keda.sh/v1alpha1
2kind: ScaledJob
3metadata:
4  name: image-processor
5  namespace: processing
6spec:
7  jobTargetRef:
8    parallelism: 1
9    completions: 1
10    activeDeadlineSeconds: 600
11    backoffLimit: 3
12    template:
13      spec:
14        containers:
15        - name: processor
16          image: my-registry/image-processor:latest
17          env:
18          - name: QUEUE_URL
19            value: https://sqs.us-east-1.amazonaws.com/123456789/images
20          resources:
21            requests:
22              cpu: 500m
23              memory: 512Mi
24            limits:
25              cpu: 2
26              memory: 2Gi
27        restartPolicy: Never
28  pollingInterval: 10
29  maxReplicaCount: 20
30  scalingStrategy:
31    strategy: "accurate"  # One job per message
32  triggers:
33  - type: aws-sqs-queue
34    authenticationRef:
35      name: keda-aws-credentials
36    metadata:
37      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/images
38      queueLength: "1"   # One job per message
39      awsRegion: us-east-1

The scalingStrategy: accurate tells KEDA to create exactly one Job per message, taking into account already-running Jobs. Without this, you can end up creating more Jobs than messages.

Kafka Consumer Lag Scaler

For Kafka consumers, you scale on consumer group lag rather than partition count:

yaml

1apiVersion: keda.sh/v1alpha1
2kind: ScaledObject
3metadata:
4  name: kafka-consumer-scaler
5  namespace: streaming
6spec:
7  scaleTargetRef:
8    name: event-consumer
9  minReplicaCount: 1        # Don't scale to zero for Kafka (commit offset implications)
10  maxReplicaCount: 30
11  triggers:
12  - type: kafka
13    metadata:
14      bootstrapServers: kafka-broker-1:9092,kafka-broker-2:9092
15      consumerGroup: event-processors
16      topic: user-events
17      lagThreshold: "100"   # Scale up when lag exceeds 100 per replica
18      offsetResetPolicy: latest

A note on minReplicaCount: 1 for Kafka: I don't recommend scale-to-zero for Kafka consumers. When a consumer group has zero active members, the offsets still accumulate. When a new consumer starts, it may or may not process historical messages depending on offsetResetPolicy. This can cause either message loss or unexpected reprocessing. Keep at least one replica running and use Kafka's partition count as a natural ceiling on parallelism.

Prometheus Scaler

The Prometheus scaler is the escape hatch when no purpose-built scaler exists. Any metric exposed to Prometheus can drive autoscaling (This is particularly powerful when scaling based on SLOs and error budgets):

yaml

1apiVersion: keda.sh/v1alpha1
2kind: ScaledObject
3metadata:
4  name: api-latency-scaler
5  namespace: production
6spec:
7  scaleTargetRef:
8    name: api-server
9  minReplicaCount: 2
10  maxReplicaCount: 40
11  triggers:
12  - type: prometheus
13    metadata:
14      serverAddress: http://prometheus.monitoring.svc:9090
15      metricName: http_requests_pending
16      query: |
17        sum(rate(http_requests_total{job="api-server",status="pending"}[2m]))
18      threshold: "50"    # Scale up when pending requests exceed 50 per replica

This gives you extreme flexibility — scale on request queue depth, database connection pool exhaustion, custom business metrics. I've used this to scale on the number of pending webhook deliveries for a notification service.

Scale-to-Zero: The Cold Start Problem

Scale-to-zero is compelling for cost optimization, especially for dev/staging environments or sporadic batch workloads. The problem is cold start latency: when a new event arrives and zero pods are running, KEDA needs to:

Detect the event (up to pollingInterval seconds)
Trigger HPA to scale up
Kubernetes schedules a new pod
Container image pulls (if not cached)
Pod initializes and starts processing

On EKS with a warm image cache, this is 15-45 seconds. With a cold image pull, it can be 2-3 minutes. For SQS workers processing background jobs where a 1-minute delay is acceptable, scale-to-zero is great. For anything latency-sensitive, keep minReplicaCount: 1.

You can mitigate cold start with a "pause container" approach — a minimal container that stays alive when scaled to zero and is replaced by the real container when work arrives. But this defeats much of the cost-saving purpose.

My rule: scale-to-zero for batch/async workloads in non-production, keep at least one replica in production unless the cost savings are significant enough to justify the latency.

KEDA vs Karpenter: A Common Confusion

KEDA and Karpenter are often mentioned together but they solve different problems:

	KEDA	Karpenter
What it scales	Pods (replicas)	Nodes (EC2 instances)
Trigger basis	External events, metrics	Pending pod resource requests
Scale to zero	Yes (pods)	Yes (nodes, when no pods need them)
Primary concern	Workload replicas	Cluster node capacity

They work well together: KEDA scales up pods based on SQS depth, Karpenter notices the pending pods and provisions new nodes to run them. When the queue drains, KEDA scales pods to zero, and Karpenter terminates the now-empty nodes.

Using KEDA for node scaling or Karpenter for pod replication is a misuse of both tools. They're complementary, not competing.

Scaler Comparison

Scaler	Use Case	Minimum Replicas Recommendation
aws-sqs-queue	SQS queue processing	0 (scale-to-zero friendly)
kafka	Kafka consumer groups	1+ (offset management)
redis-lists	Redis list/stream workers	0
prometheus	Any Prometheus metric	Depends on use case
cron	Scheduled scaling	0 (scale up before cron job)
azure-servicebus	Azure Service Bus	0
rabbitmq	RabbitMQ queue	0

Gotchas I've Hit in Production

Metric server conflicts: KEDA installs a metrics server. If you already have metrics-server installed, you'll have two metrics adapters and HPA will be confused. KEDA's metrics adapter and the standard metrics server serve different API groups (external.metrics.k8s.io vs metrics.k8s.io), so they don't technically conflict, but watch for this in clusters with custom metrics adapters.

cooldownPeriod too short: If you set cooldownPeriod: 0, pods scale to zero the moment the queue is empty. If your queue has bursty traffic, you'll thrash — constant scale-up/scale-down cycles. Start with 300 seconds and tune down.

SQS visibility timeout vs processing time: SQS has a visibility timeout that hides messages from other consumers while one is processing. If your worker takes longer than the visibility timeout, the message becomes visible again and KEDA thinks there's more work — it scales up more workers, which start processing the same message. Make sure your visibility timeout is longer than your maximum processing time.

IRSA permissions on the ScaledObject namespace: KEDA needs its own service account to have SQS/Kafka permissions, not the worker's service account. The TriggerAuthentication resource specifies which credentials KEDA uses to query the scaler — this is separate from what the worker pod uses to read from the queue.

Building event-driven workloads on Kubernetes and struggling with SQS queue depth not translating to the right replica count? Talk to us at Coding Protocols. We design and implement autoscaling strategies for async processing systems that handle real production traffic patterns.

KEDA: Event-Driven Autoscaling for Kubernetes Beyond CPU and Memory

Why HPA Alone Is Insufficient for Async Workloads

How KEDA Works

Installation

ScaledObject: Scaling Deployments and StatefulSets

SQS Queue Depth Scaler

TriggerAuthentication for IRSA

ScaledJob: Scaling Batch Workloads

Per-Message Job Processing

Kafka Consumer Lag Scaler

Prometheus Scaler

Scale-to-Zero: The Cold Start Problem

KEDA vs Karpenter: A Common Confusion

Scaler Comparison

Gotchas I've Hit in Production

Related Topics

Read Next

Why Your HPA Isn't Scaling — Fixing It with Custom Metrics (KEDA + Prometheus)

How to Deploy an LLM on Kubernetes: GPU Nodes, Model Serving, and Autoscaling

Podscape vs Lens vs k9s: A Kubernetes Management Tool Comparison