The default Kubernetes rolling update strategy sounds safe: bring up new pods before terminating old ones. But if your readiness probe isn't configured correctly, Kubernetes will send traffic to pods that aren't ready yet — and your users see errors.

This tutorial covers the complete configuration that makes rolling updates actually zero-downtime.

Why Traffic Drops During Updates

When Kubernetes updates a deployment, it follows this sequence:

Create a new pod with the updated image
Wait for the pod to pass its readiness probe
Add the pod to the Service endpoints
Terminate an old pod
Repeat until all pods are updated

The problem: if there's no readiness probe, step 2 considers the pod ready the moment the container starts. Your app might need 5–10 seconds to warm up its database connections, load config, or compile templates. During that window, the pod receives traffic it can't handle.

The second problem: when a pod receives SIGTERM (step 4), it might still be handling active requests. If the app exits immediately, those requests fail.

Step 1: Deploy a Sample Application Without Probes

bash

1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: api-server
6spec:
7  replicas: 3
8  selector:
9    matchLabels:
10      app: api-server
11  template:
12    metadata:
13      labels:
14        app: api-server
15    spec:
16      containers:
17        - name: api
18          image: nginx:1.24
19          ports:
20            - containerPort: 80
21---
22apiVersion: v1
23kind: Service
24metadata:
25  name: api-server
26spec:
27  selector:
28    app: api-server
29  ports:
30    - port: 80
31      targetPort: 80
32EOF

Step 2: Add Readiness and Liveness Probes

bash

1kubectl apply -f - <<EOF
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: api-server
6spec:
7  replicas: 3
8  selector:
9    matchLabels:
10      app: api-server
11  strategy:
12    type: RollingUpdate
13    rollingUpdate:
14      maxSurge: 1         # Allow 1 extra pod during update
15      maxUnavailable: 0   # Never reduce below desired count
16  template:
17    metadata:
18      labels:
19        app: api-server
20    spec:
21      terminationGracePeriodSeconds: 60
22      containers:
23        - name: api
24          image: nginx:1.24
25          ports:
26            - containerPort: 80
27          readinessProbe:
28            httpGet:
29              path: /healthz
30              port: 80
31            initialDelaySeconds: 5
32            periodSeconds: 5
33            failureThreshold: 3
34            successThreshold: 1
35          livenessProbe:
36            httpGet:
37              path: /healthz
38              port: 80
39            initialDelaySeconds: 15
40            periodSeconds: 10
41            failureThreshold: 3
42          lifecycle:
43            preStop:
44              exec:
45                command: ["sleep", "15"]
46EOF

Let's break down each piece:

maxUnavailable: 0 — Kubernetes must never have fewer than the desired replica count available. This forces it to bring up the new pod first (maxSurge: 1) before terminating any old pod.

readinessProbe — Until this passes, the pod does not receive traffic. initialDelaySeconds: 5 gives the app 5 seconds before the first check. failureThreshold: 3 means three consecutive failures before marking the pod unready.

livenessProbe — If this fails, Kubernetes restarts the container. Set initialDelaySeconds higher than your readiness probe — you don't want the liveness probe killing a pod that's still starting up.

terminationGracePeriodSeconds: 60 — Kubernetes waits up to 60 seconds for the pod to exit after sending SIGTERM before force-killing with SIGKILL.

lifecycle.preStop: sleep 15 — When Kubernetes removes a pod from the Service endpoints, it's not instantaneous. The kube-proxy and cloud load balancer take a few seconds to propagate the change. The preStop sleep gives in-flight requests time to complete before SIGTERM is sent to your process.

Step 3: Create a Health Endpoint

For nginx, add a simple health route using a ConfigMap:

bash

1kubectl apply -f - <<EOF
2apiVersion: v1
3kind: ConfigMap
4metadata:
5  name: nginx-health
6data:
7  default.conf: |
8    server {
9        listen 80;
10        location /healthz {
11            return 200 "ok\n";
12            add_header Content-Type text/plain;
13        }
14        location / {
15            return 200 "hello\n";
16        }
17    }
18EOF
19
20kubectl patch deployment api-server --type=json -p='[
21  {"op": "add", "path": "/spec/template/spec/volumes", "value": [{"name": "nginx-conf", "configMap": {"name": "nginx-health"}}]},
22  {"op": "add", "path": "/spec/template/spec/containers/0/volumeMounts", "value": [{"name": "nginx-conf", "mountPath": "/etc/nginx/conf.d"}]}
23]'

Step 4: Verify Zero-Downtime During Update

Send continuous traffic in one terminal:

bash

kubectl run load --rm -it --image=busybox -- sh -c \
  'while true; do wget -qO- http://api-server/healthz && sleep 0.1; done'

In another terminal, trigger an update:

bash

kubectl set image deployment/api-server api=nginx:1.25

Watch the rollout:

bash

kubectl rollout status deployment/api-server
# Waiting for deployment "api-server" rollout to finish: 1 out of 3 new replicas have been updated...
# Waiting for deployment "api-server" rollout to finish: 1 old replicas are pending termination...
# deployment "api-server" successfully rolled out

You should see no errors in the load terminal — every request returns ok.

Step 5: Roll Back if Something Goes Wrong

bash

1# View rollout history
2kubectl rollout history deployment/api-server
3
4# Roll back to the previous version
5kubectl rollout undo deployment/api-server
6
7# Roll back to a specific revision
8kubectl rollout undo deployment/api-server --to-revision=2

Verification Checklist

Before declaring a deployment zero-downtime capable, verify:

bash

1# Readiness probe is configured
2kubectl get deployment api-server -o jsonpath='{.spec.template.spec.containers[0].readinessProbe}'
3
4# maxUnavailable is 0
5kubectl get deployment api-server -o jsonpath='{.spec.strategy.rollingUpdate}'
6# {"maxSurge":"1","maxUnavailable":"0"}
7
8# terminationGracePeriodSeconds is set
9kubectl get deployment api-server -o jsonpath='{.spec.template.spec.terminationGracePeriodSeconds}'
10# 60
11
12# preStop hook exists
13kubectl get deployment api-server -o jsonpath='{.spec.template.spec.containers[0].lifecycle}'

Common Mistakes

Liveness probe timing out before the app starts — set initialDelaySeconds on the liveness probe to at least 2× your app's startup time. If liveness fires before the app is ready, Kubernetes restart-loops your pod indefinitely.

No preStop sleep — without it, SIGTERM fires while the load balancer still routes traffic to the pod. Even a 5-second sleep is better than nothing.

maxUnavailable: 25% (the default) — the default allows 25% of pods to be unavailable during an update. For a 4-replica deployment, that's 1 pod down while the new one starts. Fine for internal services, not for production APIs.

Official References

Deployments — Full Deployment spec reference including rolling update strategy, maxSurge, maxUnavailable
Configure Liveness, Readiness and Startup Probes — Official guide for all three probe types with examples
Pod Disruption Budgets — How to use PodDisruptionBudget to guarantee availability during voluntary disruptions
Deployment Strategies — Rolling update and Recreate strategy configuration reference

Zero-Downtime Deployments with Rolling Updates and Readiness Probes

Before you begin

Why Traffic Drops During Updates

Step 1: Deploy a Sample Application Without Probes

Step 2: Add Readiness and Liveness Probes

Step 3: Create a Health Endpoint

Step 4: Verify Zero-Downtime During Update

Step 5: Roll Back if Something Goes Wrong

Verification Checklist

Common Mistakes

Official References

Struggling with this in production?