Helm Best Practices for Production: Structure, Upgrades, Hooks, and When to Walk Away
Helm makes Kubernetes deployments manageable, until it doesn't. Most teams hit the same traps: secrets in values files, hooks that block rollouts, missing readiness probes. Here's how to avoid them.

Helm has won the Kubernetes packaging wars. That's not a controversial statement anymore — it's just where the ecosystem landed. Most open source software ships a Helm chart. Most teams use it for their internal services too. And most teams have at least one war story about a Helm deployment going sideways in production at the worst possible time.
The problems with Helm aren't usually with Helm itself. They're with how teams use it. Secrets in values files, upgrade commands without --atomic, hooks that block rollouts silently, charts that don't fail properly. These are all avoidable with good practices.
This post is what I tell teams when we're setting up their Helm-based deployment workflow from scratch — or cleaning up one that's been accumulating debt.
Chart Structure That Scales
A well-structured Helm chart is self-documenting. The layout should make it obvious where configuration lives, where environment-specific overrides go, and what the defaults are.
my-service/
├── Chart.yaml # chart metadata and dependencies
├── values.yaml # defaults (non-sensitive, no environment-specific values)
├── values-staging.yaml # staging overrides
├── values-production.yaml # production overrides
├── templates/
│ ├── _helpers.tpl # named templates and helpers
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ ├── serviceaccount.yaml
│ ├── hpa.yaml
│ └── NOTES.txt # post-install instructions
└── .helmignore
Chart.yaml
Keep your Chart.yaml minimal but complete. The appVersion should track your application version; the version field is the chart version and can evolve independently.
1apiVersion: v2
2name: payment-service
3description: Payment processing microservice
4type: application
5version: 1.4.2
6appVersion: "2.8.0"
7maintainers:
8 - name: Ajeet Yadav
9 email: platform@codingprotocols.com
10dependencies:
11 - name: postgresql
12 version: "13.x.x"
13 repository: "https://charts.bitnami.com/bitnami"
14 condition: postgresql.enabled # allow disabling the subchartvalues.yaml
Your default values.yaml should be comprehensive and well-commented. Every value should have a comment explaining what it does and what valid options are. Future you — and the next engineer on-call — will thank you.
1# values.yaml
2replicaCount: 2
3
4image:
5 repository: 123456789.dkr.ecr.us-east-1.amazonaws.com/payment-service
6 tag: "" # defaults to Chart.appVersion if empty
7 pullPolicy: IfNotPresent
8
9# Resource limits and requests.
10# Always set both — without requests, K8s can't schedule; without limits, a runaway
11# process can consume the entire node (If you're using [KEDA for event-driven autoscaling](/blog/keda-event-driven-autoscaling-kubernetes), these are even more critical).
12resources:
13 requests:
14 cpu: 100m
15 memory: 256Mi
16 limits:
17 cpu: 500m
18 memory: 512Mi
19
20# HPA configuration. Set enabled: false to use static replicaCount.
21autoscaling:
22 enabled: true
23 minReplicas: 2
24 maxReplicas: 10
25 targetCPUUtilizationPercentage: 70
26
27# Database connection settings. Do NOT put passwords here — use ExternalSecret.
28database:
29 host: "" # override in environment values files
30 port: 5432
31 name: payments
32 sslMode: require
33
34serviceAccount:
35 create: true
36 annotations: {}
37 # annotations:
38 # eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/payment-service
39
40# Readiness probe — must be configured for safe rolling updates.
41# A deployment without readiness probes cannot guarantee that old pods
42# are only terminated after new pods are ready to serve traffic.
43readinessProbe:
44 httpGet:
45 path: /health/ready
46 port: 8080
47 initialDelaySeconds: 10
48 periodSeconds: 5
49 failureThreshold: 3
50
51livenessProbe:
52 httpGet:
53 path: /health/live
54 port: 8080
55 initialDelaySeconds: 30
56 periodSeconds: 10
57 failureThreshold: 3Environment-Specific Values Files
The pattern that works: a base values.yaml with safe defaults, and environment-specific override files that only contain the values that differ.
1# values-production.yaml
2replicaCount: 4
3
4image:
5 tag: "2.8.0" # pinned version in production
6
7resources:
8 requests:
9 cpu: 250m
10 memory: 512Mi
11 limits:
12 cpu: 1000m
13 memory: 1Gi
14
15database:
16 host: payments-db.us-east-1.rds.amazonaws.com
17
18serviceAccount:
19 annotations:
20 eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/payment-service-prod
21
22autoscaling:
23 minReplicas: 4
24 maxReplicas: 20Deploy with:
helm upgrade --install payment-service ./payment-service \
-f values-production.yaml \
--namespace payments \
--create-namespaceThis gives you clear separation between defaults and environment configuration, without the confusion of deeply nested environment conditionals in templates.
Never Put Secrets in Values Files
This deserves its own section because I've seen it in production more times than I should have. values.yaml files end up in Git. values-production.yaml files end up in Git. Anything in Git is a secret that isn't secret.
The correct pattern: values files contain references to secrets, not the secrets themselves. Actual secret values come from your secrets management layer (External Secrets Operator, Vault, SOPS — see secrets management in Kubernetes).
# In values.yaml — a reference, not a value
existingSecret:
name: payment-service-db-creds # created by ExternalSecret operator
usernameKey: username
passwordKey: password1# In deployment.yaml template
2env:
3- name: DB_USERNAME
4 valueFrom:
5 secretKeyRef:
6 name: {{ .Values.existingSecret.name }}
7 key: {{ .Values.existingSecret.usernameKey }}
8- name: DB_PASSWORD
9 valueFrom:
10 secretKeyRef:
11 name: {{ .Values.existingSecret.name }}
12 key: {{ .Values.existingSecret.passwordKey }}The chart knows the secret name. It doesn't know the secret value. That's the right boundary.
Upgrade Strategy: --atomic, --wait, and Rollbacks
The most common source of Helm incidents is upgrade commands that don't fail fast. Without proper flags, helm upgrade returns before your Pods are actually running, and you find out something is broken when your monitoring alerts fire.
Always use --atomic in CI/CD pipelines:
helm upgrade --install payment-service ./payment-service \
-f values-production.yaml \
--namespace payments \
--atomic \
--timeout 5m \
--wait--atomic does two things: it waits for all resources to reach a ready state, and it automatically rolls back to the previous revision if the timeout is reached or any resource fails. This makes deployments self-correcting in CI/CD.
--timeout 5m sets the maximum time to wait. Choose a value longer than your slowest Pod startup, including application warmup time. If your application takes 2 minutes to be ready, 5m is a reasonable timeout.
Manual rollback:
1# View history
2helm history payment-service -n payments
3
4# Roll back to previous revision
5helm rollback payment-service 0 -n payments
6
7# Roll back to specific revision
8helm rollback payment-service 3 -n paymentsImportant: helm rollback does not roll back PersistentVolumeClaims, ConfigMaps, or Secrets. It only rolls back the Helm-managed chart resources. If your upgrade modified database schema (via a migration), rolling back Helm doesn't undo that schema change. This is why database migrations need careful handling — covered in the next section.
Helm Hooks for Database Migrations
Database migrations are the trickiest part of Helm deployments. You need the migration to run after the new code is deployed but before traffic is sent to the new Pods. Helm hooks make this possible.
1# templates/migration-job.yaml
2apiVersion: batch/v1
3kind: Job
4metadata:
5 name: {{ include "payment-service.fullname" . }}-migration-{{ .Release.Revision }}
6 annotations:
7 "helm.sh/hook": pre-upgrade,pre-install
8 "helm.sh/hook-weight": "-5"
9 "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
10spec:
11 backoffLimit: 3
12 activeDeadlineSeconds: 300 # fail after 5 minutes
13 template:
14 spec:
15 restartPolicy: OnFailure
16 serviceAccountName: {{ include "payment-service.fullname" . }}
17 containers:
18 - name: migration
19 image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
20 command: ["./migrate", "up"]
21 env:
22 - name: DATABASE_URL
23 valueFrom:
24 secretKeyRef:
25 name: {{ .Values.existingSecret.name }}
26 key: database-url
27 resources:
28 requests:
29 cpu: 100m
30 memory: 128Mi
31 limits:
32 cpu: 500m
33 memory: 256MiCritical annotations explained:
helm.sh/hook: pre-upgrade,pre-install— runs this Job before the deployment is appliedhelm.sh/hook-weight: "-5"— lower weight runs first; useful when you have multiple pre-upgrade hookshelm.sh/hook-delete-policy: before-hook-creation,hook-succeeded— delete the Job before creating a new one (prevents naming conflicts on re-runs), and delete it after success (keep failed jobs for debugging)
The trap: if the migration Job fails, the entire upgrade fails and --atomic rolls back the chart. This is the behavior you want — a failed migration should halt the deployment. But it means your migration Job needs to be idempotent. A migration that fails because it tries to add a column that already exists should not block future deployments.
Common Traps
1. Pre-install hooks blocking new namespace deployments
When you deploy to a new namespace for the first time and use pre-install hooks, the hook Job must complete before the main deployment is applied. If the hook Job needs a Secret that's created by an ExternalSecret (which is itself part of the chart), you have a chicken-and-egg problem.
The fix: put ExternalSecret resources in a separate Helm release that gets deployed first (the "foundation" layer), before the application chart.
2. Missing readiness probes causing partial outages during rolling updates
Kubernetes' rolling update strategy terminates old Pods once new Pods are "ready." Without a readiness probe, Kubernetes considers a Pod ready as soon as the container starts — which is usually before your application has finished initializing, connected to the database, and loaded its configuration.
Every single Deployment in your chart should have a readiness probe. Without it, you will have periods during rollouts where some Pods are serving requests from the old version and some are sending back 503s from the new version that isn't ready yet.
3. Using .Release.Name in resource names when you don't mean to
The default _helpers.tpl template in helm create uses .Release.Name in most resource names. This means if you rename the release (or install two releases of the same chart in one namespace), you get differently-named resources. This is usually what you want — but Services and ConfigMaps that your application references by name need to be consistent. Use a stable name via fullnameOverride in your values:
# values-production.yaml
fullnameOverride: "payment-service" # predictable, stable name4. Helm state drift from manual kubectl changes
Helm tracks the state of a release in a Secret in the release namespace. If someone runs kubectl edit deployment directly, Helm's state and the actual cluster state diverge. The next helm upgrade will overwrite those manual changes. Use GitOps (Argo CD, Flux) to prevent this — if the only way to change cluster state is through a Git merge, you eliminate this class of drift.
5. Templating logic that's actually application configuration
Helm is a templating engine. It's not a configuration management system. When your values.yaml starts having conditional blocks 5 levels deep, or when you're generating nginx.conf from Helm templates, you've gone too far.
Keep templates simple. If you need complex configuration generation, generate the configuration file in your application code (using a config library), not in Helm.
When to Consider Alternatives
Helm is not always the right tool. Here's when I'd consider alternatives:
Kustomize
Kustomize is good when:
- You're overlaying configuration on upstream manifests you don't own (e.g., a third-party operator)
- Your team finds Go templating hard to read and maintain
- You want a pure patch-based approach without a release management layer
- You're using Argo CD with kustomize-based apps (Argo's native kustomize support is first-class)
Kustomize weakness: no dependency management, no built-in release history, and kustomize build doesn't validate that the output makes sense before applying.
Plain Manifests + Argo CD
For applications with very simple configurations that barely change, plain Kubernetes manifests in Git managed by Argo CD can be simpler than a Helm chart. If you only need to change the image tag between environments, a kustomize overlay with an image transform is simpler than a Helm chart with values-staging.yaml and values-production.yaml.
Helm vs Kustomize Comparison
| Concern | Helm | Kustomize |
|---|---|---|
| Release history and rollback | Built-in | None (use Git) |
| Templating | Go templating | JSON patches and overlays |
| Dependency management | Chart.yaml dependencies | Not supported |
| Secrets handling | Values files (careful!) | Same limitations |
| GitOps integration | Good (Flux, Argo CD) | Excellent (native Argo CD) |
| Learning curve | Moderate | Lower |
| Config complexity | Can get out of hand | Patches stay readable |
Helm in a GitOps World
If you're using Argo CD or Flux, your helm upgrade command should never run from a developer's laptop against production. The CD tool manages Helm releases declaratively.
With Argo CD:
1# argocd-app.yaml
2apiVersion: argoproj.io/v1alpha1
3kind: Application
4metadata:
5 name: payment-service
6 namespace: argocd
7spec:
8 project: production
9 source:
10 repoURL: https://github.com/your-org/helm-charts
11 targetRevision: HEAD
12 path: charts/payment-service
13 helm:
14 valueFiles:
15 - values-production.yaml
16 parameters:
17 - name: image.tag
18 value: "2.8.0"
19 destination:
20 server: https://kubernetes.default.svc
21 namespace: payments
22 syncPolicy:
23 automated:
24 prune: true
25 selfHeal: true
26 syncOptions:
27 - CreateNamespace=true
28 - RespectIgnoreDifferences=trueselfHeal: true means Argo CD will revert any drift — kubectl edit on a managed resource will be reverted on the next sync cycle. This is the behavior you want in production.
The Production Helm Checklist
Before deploying a Helm chart to production:
-
--atomic --wait --timeoutin all CI/CD deployment commands - Readiness and liveness probes on every Deployment
- No secret values in
values.yamlorvalues-production.yaml - Database migration hook is idempotent and has an
activeDeadlineSecondslimit -
resources.requestsandresources.limitsset on all containers -
hook-delete-policyset on all hooks to prevent naming conflicts on re-runs - Chart tested with
helm lintandhelm template | kubectl apply --dry-run=client -f - -
fullnameOverrideset for predictable resource names - Release deployed via GitOps, not manual
helm upgrade
Helm is a good tool. But it rewards discipline. The teams that get the most out of it are the ones who have made the implicit explicit — documented their values file conventions, standardized their deployment commands, and built guardrails in the right places.
Want help structuring Helm charts and deployment pipelines for your Kubernetes services? Talk to us at Coding Protocols. We help platform teams build deployment workflows that are reliable enough to trust and simple enough to maintain.


