Helm Best Practices for Production: Chart Structure, Upgrades, Hooks, and Alternatives

Helm has won the Kubernetes packaging wars. That's not a controversial statement anymore — it's just where the ecosystem landed. Most open source software ships a Helm chart. Most teams use it for their internal services too. And most teams have at least one war story about a Helm deployment going sideways in production at the worst possible time.

The problems with Helm aren't usually with Helm itself. They're with how teams use it. Secrets in values files, upgrade commands without --atomic, hooks that block rollouts silently, charts that don't fail properly. These are all avoidable with good practices.

This post is what I tell teams when we're setting up their Helm-based deployment workflow from scratch — or cleaning up one that's been accumulating debt.

Chart Structure That Scales

A well-structured Helm chart is self-documenting. The layout should make it obvious where configuration lives, where environment-specific overrides go, and what the defaults are.

my-service/
├── Chart.yaml              # chart metadata and dependencies
├── values.yaml             # defaults (non-sensitive, no environment-specific values)
├── values-staging.yaml     # staging overrides
├── values-production.yaml  # production overrides
├── templates/
│   ├── _helpers.tpl        # named templates and helpers
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── configmap.yaml
│   ├── serviceaccount.yaml
│   ├── hpa.yaml
│   └── NOTES.txt           # post-install instructions
└── .helmignore

Chart.yaml

Keep your Chart.yaml minimal but complete. The appVersion should track your application version; the version field is the chart version and can evolve independently.

yaml

1apiVersion: v2
2name: payment-service
3description: Payment processing microservice
4type: application
5version: 1.4.2
6appVersion: "2.8.0"
7maintainers:
8  - name: Ajeet Yadav
9    email: platform@codingprotocols.com
10dependencies:
11  - name: postgresql
12    version: "13.x.x"
13    repository: "https://charts.bitnami.com/bitnami"
14    condition: postgresql.enabled   # allow disabling the subchart

values.yaml

Your default values.yaml should be comprehensive and well-commented. Every value should have a comment explaining what it does and what valid options are. Future you — and the next engineer on-call — will thank you.

yaml

1# values.yaml
2replicaCount: 2
3
4image:
5  repository: 123456789.dkr.ecr.us-east-1.amazonaws.com/payment-service
6  tag: ""  # defaults to Chart.appVersion if empty
7  pullPolicy: IfNotPresent
8
9# Resource limits and requests.
10# Always set both — without requests, K8s can't schedule; without limits, a runaway
11# process can consume the entire node (If you're using [KEDA for event-driven autoscaling](/blog/keda-event-driven-autoscaling-kubernetes), these are even more critical).
12resources:
13  requests:
14    cpu: 100m
15    memory: 256Mi
16  limits:
17    cpu: 500m
18    memory: 512Mi
19
20# HPA configuration. Set enabled: false to use static replicaCount.
21autoscaling:
22  enabled: true
23  minReplicas: 2
24  maxReplicas: 10
25  targetCPUUtilizationPercentage: 70
26
27# Database connection settings. Do NOT put passwords here — use ExternalSecret.
28database:
29  host: ""  # override in environment values files
30  port: 5432
31  name: payments
32  sslMode: require
33
34serviceAccount:
35  create: true
36  annotations: {}
37  # annotations:
38  #   eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/payment-service
39
40# Readiness probe — must be configured for safe rolling updates.
41# A deployment without readiness probes cannot guarantee that old pods
42# are only terminated after new pods are ready to serve traffic.
43readinessProbe:
44  httpGet:
45    path: /health/ready
46    port: 8080
47  initialDelaySeconds: 10
48  periodSeconds: 5
49  failureThreshold: 3
50
51livenessProbe:
52  httpGet:
53    path: /health/live
54    port: 8080
55  initialDelaySeconds: 30
56  periodSeconds: 10
57  failureThreshold: 3

Environment-Specific Values Files

The pattern that works: a base values.yaml with safe defaults, and environment-specific override files that only contain the values that differ.

yaml

1# values-production.yaml
2replicaCount: 4
3
4image:
5  tag: "2.8.0"  # pinned version in production
6
7resources:
8  requests:
9    cpu: 250m
10    memory: 512Mi
11  limits:
12    cpu: 1000m
13    memory: 1Gi
14
15database:
16  host: payments-db.us-east-1.rds.amazonaws.com
17
18serviceAccount:
19  annotations:
20    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/payment-service-prod
21
22autoscaling:
23  minReplicas: 4
24  maxReplicas: 20

Deploy with:

bash

helm upgrade --install payment-service ./payment-service \
  -f values-production.yaml \
  --namespace payments \
  --create-namespace

This gives you clear separation between defaults and environment configuration, without the confusion of deeply nested environment conditionals in templates.

Never Put Secrets in Values Files

This deserves its own section because I've seen it in production more times than I should have. values.yaml files end up in Git. values-production.yaml files end up in Git. Anything in Git is a secret that isn't secret.

The correct pattern: values files contain references to secrets, not the secrets themselves. Actual secret values come from your secrets management layer (External Secrets Operator, Vault, SOPS — see secrets management in Kubernetes).

yaml

# In values.yaml — a reference, not a value
existingSecret:
  name: payment-service-db-creds  # created by ExternalSecret operator
  usernameKey: username
  passwordKey: password

yaml

1# In deployment.yaml template
2env:
3- name: DB_USERNAME
4  valueFrom:
5    secretKeyRef:
6      name: {{ .Values.existingSecret.name }}
7      key: {{ .Values.existingSecret.usernameKey }}
8- name: DB_PASSWORD
9  valueFrom:
10    secretKeyRef:
11      name: {{ .Values.existingSecret.name }}
12      key: {{ .Values.existingSecret.passwordKey }}

The chart knows the secret name. It doesn't know the secret value. That's the right boundary.

Upgrade Strategy: --atomic, --wait, and Rollbacks

The most common source of Helm incidents is upgrade commands that don't fail fast. Without proper flags, helm upgrade returns before your Pods are actually running, and you find out something is broken when your monitoring alerts fire.

Always use --atomic in CI/CD pipelines:

bash

helm upgrade --install payment-service ./payment-service \
  -f values-production.yaml \
  --namespace payments \
  --atomic \
  --timeout 5m \
  --wait

--atomic does two things: it waits for all resources to reach a ready state, and it automatically rolls back to the previous revision if the timeout is reached or any resource fails. This makes deployments self-correcting in CI/CD.

--timeout 5m sets the maximum time to wait. Choose a value longer than your slowest Pod startup, including application warmup time. If your application takes 2 minutes to be ready, 5m is a reasonable timeout.

Manual rollback:

bash

1# View history
2helm history payment-service -n payments
3
4# Roll back to previous revision
5helm rollback payment-service 0 -n payments
6
7# Roll back to specific revision
8helm rollback payment-service 3 -n payments

Important: helm rollback does not roll back PersistentVolumeClaims, ConfigMaps, or Secrets. It only rolls back the Helm-managed chart resources. If your upgrade modified database schema (via a migration), rolling back Helm doesn't undo that schema change. This is why database migrations need careful handling — covered in the next section.

Helm Hooks for Database Migrations

Database migrations are the trickiest part of Helm deployments. You need the migration to run after the new code is deployed but before traffic is sent to the new Pods. Helm hooks make this possible.

yaml

1# templates/migration-job.yaml
2apiVersion: batch/v1
3kind: Job
4metadata:
5  name: {{ include "payment-service.fullname" . }}-migration-{{ .Release.Revision }}
6  annotations:
7    "helm.sh/hook": pre-upgrade,pre-install
8    "helm.sh/hook-weight": "-5"
9    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
10spec:
11  backoffLimit: 3
12  activeDeadlineSeconds: 300  # fail after 5 minutes
13  template:
14    spec:
15      restartPolicy: OnFailure
16      serviceAccountName: {{ include "payment-service.fullname" . }}
17      containers:
18      - name: migration
19        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
20        command: ["./migrate", "up"]
21        env:
22        - name: DATABASE_URL
23          valueFrom:
24            secretKeyRef:
25              name: {{ .Values.existingSecret.name }}
26              key: database-url
27        resources:
28          requests:
29            cpu: 100m
30            memory: 128Mi
31          limits:
32            cpu: 500m
33            memory: 256Mi

Critical annotations explained:

helm.sh/hook: pre-upgrade,pre-install — runs this Job before the deployment is applied
helm.sh/hook-weight: "-5" — lower weight runs first; useful when you have multiple pre-upgrade hooks
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded — delete the Job before creating a new one (prevents naming conflicts on re-runs), and delete it after success (keep failed jobs for debugging)

The trap: if the migration Job fails, the entire upgrade fails and --atomic rolls back the chart. This is the behavior you want — a failed migration should halt the deployment. But it means your migration Job needs to be idempotent. A migration that fails because it tries to add a column that already exists should not block future deployments.

Common Traps

1. Pre-install hooks blocking new namespace deployments

When you deploy to a new namespace for the first time and use pre-install hooks, the hook Job must complete before the main deployment is applied. If the hook Job needs a Secret that's created by an ExternalSecret (which is itself part of the chart), you have a chicken-and-egg problem.

The fix: put ExternalSecret resources in a separate Helm release that gets deployed first (the "foundation" layer), before the application chart.

2. Missing readiness probes causing partial outages during rolling updates

Kubernetes' rolling update strategy terminates old Pods once new Pods are "ready." Without a readiness probe, Kubernetes considers a Pod ready as soon as the container starts — which is usually before your application has finished initializing, connected to the database, and loaded its configuration.

Every single Deployment in your chart should have a readiness probe. Without it, you will have periods during rollouts where some Pods are serving requests from the old version and some are sending back 503s from the new version that isn't ready yet.

3. Using `.Release.Name` in resource names when you don't mean to

The default _helpers.tpl template in helm create uses .Release.Name in most resource names. This means if you rename the release (or install two releases of the same chart in one namespace), you get differently-named resources. This is usually what you want — but Services and ConfigMaps that your application references by name need to be consistent. Use a stable name via fullnameOverride in your values:

yaml

# values-production.yaml
fullnameOverride: "payment-service"  # predictable, stable name

4. Helm state drift from manual kubectl changes

Helm tracks the state of a release in a Secret in the release namespace. If someone runs kubectl edit deployment directly, Helm's state and the actual cluster state diverge. The next helm upgrade will overwrite those manual changes. Use GitOps (Argo CD, Flux) to prevent this — if the only way to change cluster state is through a Git merge, you eliminate this class of drift.

5. Templating logic that's actually application configuration

Helm is a templating engine. It's not a configuration management system. When your values.yaml starts having conditional blocks 5 levels deep, or when you're generating nginx.conf from Helm templates, you've gone too far.

Keep templates simple. If you need complex configuration generation, generate the configuration file in your application code (using a config library), not in Helm.

When to Consider Alternatives

Helm is not always the right tool. Here's when I'd consider alternatives:

Kustomize

Kustomize is good when:

You're overlaying configuration on upstream manifests you don't own (e.g., a third-party operator)
Your team finds Go templating hard to read and maintain
You want a pure patch-based approach without a release management layer
You're using Argo CD with kustomize-based apps (Argo's native kustomize support is first-class)

Kustomize weakness: no dependency management, no built-in release history, and kustomize build doesn't validate that the output makes sense before applying.

Plain Manifests + Argo CD

For applications with very simple configurations that barely change, plain Kubernetes manifests in Git managed by Argo CD can be simpler than a Helm chart. If you only need to change the image tag between environments, a kustomize overlay with an image transform is simpler than a Helm chart with values-staging.yaml and values-production.yaml.

Helm vs Kustomize Comparison

Concern	Helm	Kustomize
Release history and rollback	Built-in	None (use Git)
Templating	Go templating	JSON patches and overlays
Dependency management	`Chart.yaml` dependencies	Not supported
Secrets handling	Values files (careful!)	Same limitations
GitOps integration	Good (Flux, Argo CD)	Excellent (native Argo CD)
Learning curve	Moderate	Lower
Config complexity	Can get out of hand	Patches stay readable

Helm in a GitOps World

If you're using Argo CD or Flux, your helm upgrade command should never run from a developer's laptop against production. The CD tool manages Helm releases declaratively.

With Argo CD:

yaml

1# argocd-app.yaml
2apiVersion: argoproj.io/v1alpha1
3kind: Application
4metadata:
5  name: payment-service
6  namespace: argocd
7spec:
8  project: production
9  source:
10    repoURL: https://github.com/your-org/helm-charts
11    targetRevision: HEAD
12    path: charts/payment-service
13    helm:
14      valueFiles:
15      - values-production.yaml
16      parameters:
17      - name: image.tag
18        value: "2.8.0"
19  destination:
20    server: https://kubernetes.default.svc
21    namespace: payments
22  syncPolicy:
23    automated:
24      prune: true
25      selfHeal: true
26    syncOptions:
27    - CreateNamespace=true
28    - RespectIgnoreDifferences=true

selfHeal: true means Argo CD will revert any drift — kubectl edit on a managed resource will be reverted on the next sync cycle. This is the behavior you want in production.

The Production Helm Checklist

Before deploying a Helm chart to production:

--atomic --wait --timeout in all CI/CD deployment commands
Readiness and liveness probes on every Deployment
No secret values in values.yaml or values-production.yaml
Database migration hook is idempotent and has an activeDeadlineSeconds limit
resources.requests and resources.limits set on all containers
hook-delete-policy set on all hooks to prevent naming conflicts on re-runs
Chart tested with helm lint and helm template | kubectl apply --dry-run=client -f -
fullnameOverride set for predictable resource names
Release deployed via GitOps, not manual helm upgrade

Helm is a good tool. But it rewards discipline. The teams that get the most out of it are the ones who have made the implicit explicit — documented their values file conventions, standardized their deployment commands, and built guardrails in the right places.

Want help structuring Helm charts and deployment pipelines for your Kubernetes services? Talk to us at Coding Protocols. We help platform teams build deployment workflows that are reliable enough to trust and simple enough to maintain.

Helm Best Practices for Production: Structure, Upgrades, Hooks, and When to Walk Away

Chart Structure That Scales

Chart.yaml

values.yaml

Environment-Specific Values Files

Never Put Secrets in Values Files

Upgrade Strategy: --atomic, --wait, and Rollbacks

Helm Hooks for Database Migrations

Common Traps

1. Pre-install hooks blocking new namespace deployments

2. Missing readiness probes causing partial outages during rolling updates

3. Using `.Release.Name` in resource names when you don't mean to

4. Helm state drift from manual kubectl changes

5. Templating logic that's actually application configuration

When to Consider Alternatives

Kustomize

Plain Manifests + Argo CD

Helm vs Kustomize Comparison

Helm in a GitOps World

The Production Helm Checklist

Related Topics

Read Next

RBAC misconfigurations that break production (and how to fix them)

The Kubernetes Periodic Table: Every Essential Tool Category Explained

The Kubernetes Decision Path: A Practical Framework for Your Cloud-Native Stack

Chart Structure That Scales

Chart.yaml

values.yaml

Environment-Specific Values Files

Never Put Secrets in Values Files

Upgrade Strategy: --atomic, --wait, and Rollbacks

Helm Hooks for Database Migrations

Common Traps

1. Pre-install hooks blocking new namespace deployments

2. Missing readiness probes causing partial outages during rolling updates

3. Using .Release.Name in resource names when you don't mean to

4. Helm state drift from manual kubectl changes

5. Templating logic that's actually application configuration

When to Consider Alternatives

Kustomize

Plain Manifests + Argo CD

Helm vs Kustomize Comparison

Helm in a GitOps World

The Production Helm Checklist

Related Topics

Read Next

RBAC misconfigurations that break production (and how to fix them)

The Kubernetes Periodic Table: Every Essential Tool Category Explained

The Kubernetes Decision Path: A Practical Framework for Your Cloud-Native Stack

3. Using `.Release.Name` in resource names when you don't mean to