Automating TLS with cert-manager and Let's Encrypt
Manual TLS certificate management doesn't scale. cert-manager automates issuance, renewal, and rotation using Let's Encrypt — and integrates directly with your Ingress resources. This tutorial covers HTTP-01 and DNS-01 challenges, debugging stuck certificates, and rotating to production after testing with staging.
Before you begin
- Kubernetes cluster with an Ingress controller (nginx or similar)
- kubectl and Helm installed
- A real domain name with DNS access
- A public-facing cluster (for HTTP-01) or DNS provider API access (for DNS-01)
Manual certificate management is a 2am pager alert waiting to happen. Someone forgets to renew, the cert expires on a Friday afternoon, and now you're scrambling to run openssl commands in production while users are seeing browser warnings. I've been there. cert-manager eliminates this entirely — it issues, renews, and rotates certificates automatically, and integrates directly with Kubernetes Ingress. This tutorial wires it up end-to-end, from a first install through wildcard certs, with the debugging steps you'll actually need.
What You'll Build
By the end of this tutorial you'll have:
- cert-manager installed via Helm with CRDs managed properly
- A Let's Encrypt staging ClusterIssuer for testing (no rate limits)
- A Certificate issued via HTTP-01 challenge wired to a real Ingress
- A production ClusterIssuer switch once staging is confirmed working
- A DNS-01 challenge example for wildcard certificates using Cloudflare
The full flow takes about 45 minutes the first time. Once you have the pattern down, replicating it across domains takes 5 minutes.
Step 1: Install cert-manager
cert-manager ships as a Helm chart. The crds.enabled=true flag tells Helm to install the CRDs as part of the release — this means they'll be upgraded and removed with the chart, which is what you want.
1helm repo add jetstack https://charts.jetstack.io
2helm repo update
3
4helm install cert-manager jetstack/cert-manager \
5 --namespace cert-manager \
6 --create-namespace \
7 --version v1.14.0 \
8 --set crds.enabled=trueVerify the pods are running before moving on:
kubectl get pods -n cert-managerYou should see three pods all in Running state:
NAME READY STATUS RESTARTS AGE
cert-manager-7d75b9b5b5-xkjqp 1/1 Running 0 90s
cert-manager-cainjector-6c9f7b5b8-4frcl 1/1 Running 0 90s
cert-manager-webhook-6b9b7b5b5-z7vdm 1/1 Running 0 90s
If the webhook pod is stuck in Init or ContainerCreating, wait another 30 seconds — it needs to inject its own CA before it can serve requests. If it stays stuck, check kubectl logs -n cert-manager deployment/cert-manager-webhook.
Step 2: Create a Staging ClusterIssuer
Always start with Let's Encrypt staging. Production is rate-limited to 5 duplicate certificates per domain per week. If you make a configuration mistake — and you will the first time — you'll burn that quota on failed attempts. Staging has no rate limits. The only difference is that staging issues certificates from a fake CA that browsers don't trust, which is exactly what you want for testing.
1apiVersion: cert-manager.io/v1
2kind: ClusterIssuer
3metadata:
4 name: letsencrypt-staging
5spec:
6 acme:
7 server: https://acme-staging-v02.api.letsencrypt.org/directory
8 email: your@email.com
9 privateKeySecretRef:
10 name: letsencrypt-staging-key
11 solvers:
12 - http01:
13 ingress:
14 ingressClassName: nginxApply it:
kubectl apply -f clusterissuer-staging.yamlCheck that the issuer registered successfully:
kubectl describe clusterissuer letsencrypt-stagingLook for Status: True and Ready in the conditions. If you see Failed to register ACME account, check that the email is valid and the ACME server URL is correct.
Step 3: Issue a Certificate via HTTP-01
There are two ways to trigger cert-manager: annotate your Ingress directly, or create a standalone Certificate resource. I'll show both — use annotations for simple cases, the Certificate resource when you need more control (multiple SANs, custom duration, etc.).
Approach 1: Ingress annotation
Add these to your existing Ingress:
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: my-app-ingress
5 namespace: default
6 annotations:
7 cert-manager.io/cluster-issuer: "letsencrypt-staging"
8spec:
9 ingressClassName: nginx
10 tls:
11 - hosts:
12 - yourdomain.com
13 secretName: yourdomain-tls
14 rules:
15 - host: yourdomain.com
16 http:
17 paths:
18 - path: /
19 pathType: Prefix
20 backend:
21 service:
22 name: my-app
23 port:
24 number: 80cert-manager watches for Ingress resources with that annotation and automatically creates a Certificate resource for you.
Approach 2: Certificate resource
When you want explicit control — multiple domains, custom renewal window, or a cert not tied to a specific Ingress:
1apiVersion: cert-manager.io/v1
2kind: Certificate
3metadata:
4 name: yourdomain-cert
5 namespace: default
6spec:
7 secretName: yourdomain-tls
8 issuerRef:
9 name: letsencrypt-staging
10 kind: ClusterIssuer
11 dnsNames:
12 - yourdomain.comApply and watch the status:
kubectl apply -f certificate.yaml
kubectl get certificate -wThe READY column should flip to True within 60–90 seconds if everything is configured correctly. If it stays False, move to the next step.
Step 4: Debug the Stuck Certificate
This is where everyone gets lost. cert-manager creates a chain of resources during issuance: Certificate → CertificateRequest → Order → Challenge. If something fails, it fails at one level of this chain. You need to walk down the chain to find where.
1# Level 1: Certificate — shows overall status and any top-level errors
2kubectl describe certificate yourdomain-cert -n default
3
4# Level 2: CertificateRequest — shows the actual CSR and approval status
5kubectl describe certificaterequest -n default
6
7# Level 3: Order — shows the ACME order state with Let's Encrypt
8kubectl describe order -n default
9
10# Level 4: Challenge — shows the individual challenge attempt
11kubectl describe challenge -n defaultThe Challenge resource is usually where you find the actual error. Common failure reasons:
HTTP-01 challenge URL not reachable. Let's Encrypt sends a GET request to http://yourdomain.com/.well-known/acme-challenge/<token>. If port 80 isn't publicly reachable, or your Ingress controller isn't routing that path, the challenge fails. Check it yourself:
curl http://yourdomain.com/.well-known/acme-challenge/<token>You should get back the token value. If you get a 404 or connection refused, the Ingress isn't routing the challenge path correctly. Some Ingress controllers block /.well-known/ by default — check your nginx configuration.
ingressClassName mismatch. The ingressClassName in your ClusterIssuer solver must match the class your Ingress controller is actually using. Check with:
kubectl get ingressclassDNS not propagated. If you just created the DNS record for the domain, Let's Encrypt might be checking before it resolves. Wait for propagation and then delete the stuck Challenge resource to trigger a retry:
kubectl delete challenge -n default <challenge-name>cert-manager will create a new one automatically.
cert-manager logs. When all else fails:
kubectl logs -n cert-manager deployment/cert-manager -fThe log lines are verbose but the errors are clear — look for lines containing Error or Failed.
Step 5: Switch to Production
Once your staging certificate is issued, kubectl get certificate shows READY=True. The cert itself will be from Let's Encrypt's staging CA (browsers will warn — that's expected). At this point you know the full pipeline works. Now switch to production.
1apiVersion: cert-manager.io/v1
2kind: ClusterIssuer
3metadata:
4 name: letsencrypt-production
5spec:
6 acme:
7 server: https://acme-v02.api.letsencrypt.org/directory
8 email: your@email.com
9 privateKeySecretRef:
10 name: letsencrypt-production-key
11 solvers:
12 - http01:
13 ingress:
14 ingressClassName: nginxkubectl apply -f clusterissuer-production.yamlUpdate your Ingress annotation or Certificate resource to reference letsencrypt-production. Then delete the old TLS secret so cert-manager issues a fresh one from the production CA — cert-manager won't replace an existing secret automatically when you change the issuer:
kubectl delete secret yourdomain-tls -n defaultcert-manager detects the missing secret and triggers a new issuance immediately. Watch it:
kubectl get certificate -w -n defaultWithin 60–90 seconds, READY flips to True and the secret is recreated with a valid, browser-trusted certificate. Your Ingress controller picks it up without any restart.
Step 6: DNS-01 for Wildcard Certificates
HTTP-01 challenges only work for exact hostnames. If you want *.yourdomain.com — so every subdomain gets TLS without individual certificates — you need DNS-01. Instead of serving a file over HTTP, cert-manager creates a DNS TXT record to prove domain ownership. This means it needs API access to your DNS provider.
I'll use Cloudflare here since it's common, but cert-manager supports Route53, Google Cloud DNS, Azure DNS, and many others via the same pattern.
Create a Cloudflare API token with Zone:DNS:Edit permissions for your domain. Then store it as a secret in the cert-manager namespace (not your app namespace — the ClusterIssuer lives there):
kubectl create secret generic cloudflare-api-token \
--from-literal=api-token=<your-cloudflare-api-token> \
-n cert-managerCreate the DNS-01 ClusterIssuer:
1apiVersion: cert-manager.io/v1
2kind: ClusterIssuer
3metadata:
4 name: letsencrypt-dns01
5spec:
6 acme:
7 server: https://acme-v02.api.letsencrypt.org/directory
8 email: your@email.com
9 privateKeySecretRef:
10 name: letsencrypt-dns01-key
11 solvers:
12 - dns01:
13 cloudflare:
14 apiTokenSecretRef:
15 name: cloudflare-api-token
16 key: api-tokenkubectl apply -f clusterissuer-dns01.yamlRequest the wildcard certificate. Note that Let's Encrypt requires you to include both *.yourdomain.com and yourdomain.com as separate SANs — the wildcard doesn't cover the apex domain:
1apiVersion: cert-manager.io/v1
2kind: Certificate
3metadata:
4 name: wildcard-cert
5 namespace: default
6spec:
7 secretName: wildcard-tls
8 issuerRef:
9 name: letsencrypt-dns01
10 kind: ClusterIssuer
11 dnsNames:
12 - "*.yourdomain.com"
13 - "yourdomain.com"kubectl apply -f wildcard-certificate.yaml
kubectl get certificate wildcard-cert -wDNS-01 issuance takes longer than HTTP-01 — up to 2–3 minutes while cert-manager creates the TXT record and Let's Encrypt waits for DNS propagation. If it gets stuck, check the Challenge resource:
kubectl describe challenge -n defaultLook for errors around DNS propagation or API token permissions. A common mistake is using a Cloudflare API key (account-level) instead of an API token (zone-level) — they're different things in the Cloudflare dashboard.
Verification
After issuance, verify the certificate contents:
1# List all certificates across namespaces
2kubectl get certificate -A
3
4# Inspect the issued cert — check Subject, SANs, and expiry
5kubectl get secret yourdomain-tls -o jsonpath='{.data.tls\.crt}' | \
6 base64 -d | \
7 openssl x509 -noout -text | \
8 grep -E "Subject:|DNS:|Not After"
9
10# Verify the live HTTPS endpoint
11curl -v https://yourdomain.com 2>&1 | grep -E "SSL|issuer|expire"For the production issuer, the issuer line from curl should reference R10 or R11 (Let's Encrypt's current intermediate CAs), not the fake staging CA. If you still see the staging CA, the old secret wasn't deleted and re-issued — delete it again and wait.
cert-manager automatically renews certificates 30 days before expiry by default. You don't need to do anything — it handles the renewal challenge the same way it handled the initial issuance. You can verify renewal is configured by checking the Certificate status:
kubectl describe certificate yourdomain-cert -n default | grep -A5 "Renewal Time"Common Mistakes
Using production before verifying with staging. You will hit a configuration issue on your first attempt. Always validate with staging first or you'll burn your 5-cert/domain/week quota on debugging. Staging is indistinguishable from production during the issuance process — the only difference is the CA root.
Reusing the same secretName across Certificates. Each Certificate resource must have a unique secretName. If two Certificates point to the same secret, cert-manager will fight over it and produce undefined behavior. This is a subtle bug that only surfaces when you have multiple domains.
HTTP-01 with no port 80 exposure. If your Ingress controller only accepts HTTPS (port 443 only), HTTP-01 challenges will never complete. Let's Encrypt needs to reach port 80. Either open port 80 on your load balancer for the challenge path, or switch to DNS-01.
Not deleting the old TLS secret when switching issuers. cert-manager will not replace an existing, valid secret with a certificate from a different issuer. It sees the secret exists, assumes it's managed, and does nothing. Always delete the secret manually when switching from staging to production.
Issuer vs ClusterIssuer namespace mismatch. Issuer is namespace-scoped — a Certificate in namespace: production can't reference an Issuer in namespace: staging. Use ClusterIssuer (cluster-scoped) unless you have a specific reason to scope issuers per namespace. Most teams use ClusterIssuer exclusively.
Cleanup
1kubectl delete certificate yourdomain-cert wildcard-cert -n default
2kubectl delete secret yourdomain-tls wildcard-tls -n default
3kubectl delete clusterissuer letsencrypt-staging letsencrypt-production letsencrypt-dns01
4kubectl delete secret cloudflare-api-token -n cert-manager
5
6helm uninstall cert-manager -n cert-manager
7kubectl delete namespace cert-managerNote that uninstalling the Helm release with crds.enabled=true will also delete the CRDs and all cert-manager resources. If you have other tooling depending on those CRDs, skip the Helm uninstall and clean up manually.
From here, the natural next step is integrating cert-manager with an internal PKI for private certificates using the CA issuer type, or setting up trust-manager to distribute CA bundles across namespaces. But for public-facing services, what you've built here covers the full lifecycle — issuance, renewal, wildcard support, and the debugging path when things go wrong.
Official References
- cert-manager Documentation — Official docs covering installation, issuers, certificates, and troubleshooting
- ACME HTTP-01 Challenge — How cert-manager implements HTTP-01 challenges and the Ingress solver
- ACME DNS-01 Challenge — DNS-01 solvers for all supported providers including Cloudflare, Route53, and Google Cloud DNS
- Let's Encrypt Rate Limits — Official rate limit documentation — read before using the production ACME server
- cert-manager Troubleshooting — Official debugging guide for stuck certificates and challenge failures
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.