You don't need to configure Prometheus from scratch. The kube-prometheus-stack Helm chart deploys Prometheus, Grafana, Alertmanager, and a set of pre-built dashboards and alert rules that cover the entire Kubernetes stack — nodes, pods, deployments, PVCs, and more.

This tutorial gets you from zero to a working monitoring stack, then shows you how to add your own application metrics.

Step 1: Install the kube-prometheus-stack

bash

1helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
2helm repo update
3
4helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
5  --namespace monitoring \
6  --create-namespace \
7  --set grafana.adminPassword=changeme \
8  --set prometheus.prometheusSpec.retention=15d \
9  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi

This deploys:

Prometheus (metrics collection and storage)
Grafana (dashboards and visualisation)
Alertmanager (alert routing and deduplication)
kube-state-metrics (exposes Kubernetes object state as metrics)
node-exporter (exposes host-level metrics: CPU, memory, disk, network)

Wait for everything to start:

bash

kubectl wait --for=condition=Ready pods --all -n monitoring --timeout=180s
kubectl get pods -n monitoring

Step 2: Access Grafana

Forward Grafana's port locally:

bash

kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring

Open http://localhost:3000. Log in with admin / changeme.

Navigate to Dashboards → Browse. You'll find 30+ pre-built dashboards:

Kubernetes / Cluster — overall cluster health
Kubernetes / Nodes — per-node CPU, memory, disk
Kubernetes / Pods — per-pod resource usage
Kubernetes / Workloads — deployment/daemonset/statefulset status

Step 3: Access Prometheus

bash

kubectl port-forward svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring

Open http://localhost:9090. This is Prometheus's built-in query UI.

Try a few queries:

promql

1# CPU usage per pod (5-minute average)
2rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
3
4# Memory usage per pod
5container_memory_working_set_bytes{namespace="production"}
6
7# Number of ready replicas per deployment
8kube_deployment_status_replicas_ready{namespace="production"}
9
10# Pod restart count
11increase(kube_pod_container_status_restarts_total[1h])

Step 4: Instrument Your Application

To expose custom metrics from your application, use a Prometheus client library.

Node.js (prom-client):

javascript

1const client = require('prom-client');
2const register = new client.Registry();
3
4// Counter: total HTTP requests
5const httpRequestsTotal = new client.Counter({
6  name: 'http_requests_total',
7  help: 'Total number of HTTP requests',
8  labelNames: ['method', 'status_code'],
9  registers: [register]
10});
11
12// Histogram: request duration
13const httpRequestDuration = new client.Histogram({
14  name: 'http_request_duration_seconds',
15  help: 'HTTP request duration in seconds',
16  labelNames: ['method', 'route', 'status_code'],
17  buckets: [0.01, 0.05, 0.1, 0.3, 0.5, 1, 2, 5],
18  registers: [register]
19});
20
21// Expose metrics endpoint
22app.get('/metrics', async (req, res) => {
23  res.set('Content-Type', register.contentType);
24  res.end(await register.metrics());
25});
26
27// Instrument a route
28app.use((req, res, next) => {
29  const end = httpRequestDuration.startTimer({
30    method: req.method,
31    route: req.path
32  });
33  res.on('finish', () => {
34    httpRequestsTotal.inc({ method: req.method, status_code: res.statusCode });
35    end({ status_code: res.statusCode });
36  });
37  next();
38});

Go (prometheus/client_golang):

1import (
2    "github.com/prometheus/client_golang/prometheus"
3    "github.com/prometheus/client_golang/prometheus/promauto"
4    "github.com/prometheus/client_golang/prometheus/promhttp"
5)
6
7var (
8    httpRequestsTotal = promauto.NewCounterVec(prometheus.CounterOpts{
9        Name: "http_requests_total",
10        Help: "Total number of HTTP requests",
11    }, []string{"method", "status_code"})
12
13    httpRequestDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
14        Name:    "http_request_duration_seconds",
15        Help:    "HTTP request duration",
16        Buckets: []float64{0.01, 0.05, 0.1, 0.3, 0.5, 1, 2, 5},
17    }, []string{"method", "route"})
18)
19
20// In your main():
21http.Handle("/metrics", promhttp.Handler())

Step 5: Tell Prometheus to Scrape Your App

Create a ServiceMonitor — a CRD that kube-prometheus-stack uses to configure scraping:

yaml

1apiVersion: monitoring.coreos.com/v1
2kind: ServiceMonitor
3metadata:
4  name: my-app
5  namespace: production
6  labels:
7    release: kube-prometheus-stack   # Must match the Helm release label
8spec:
9  selector:
10    matchLabels:
11      app: my-app
12  endpoints:
13    - port: http
14      path: /metrics
15      interval: 15s

bash

kubectl apply -f servicemonitor.yaml

Your application's Service must have a port named http (or whatever you specify in endpoints.port). Verify Prometheus is scraping it at http://localhost:9090 → Status → Targets.

Step 6: Build a Grafana Dashboard for Your App

In Grafana, click the + icon → Dashboard → Add visualization.

Panel 1: Request rate

promql

sum(rate(http_requests_total{namespace="production"}[2m])) by (status_code)

Set visualization type: Time series. Set legend to {{status_code}}.

Panel 2: P95 latency

promql

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket{namespace="production"}[5m])) by (le, route)
)

Panel 3: Error rate (5xx)

promql

sum(rate(http_requests_total{namespace="production",status_code=~"5.."}[2m]))
/
sum(rate(http_requests_total{namespace="production"}[2m]))

Set threshold to 0.01 (1% error rate = red).

Save the dashboard. Click Share → Export → save the JSON to your repo to version it.

Step 7: Create an Alert Rule

Alert when error rate exceeds 1% for 5 minutes:

yaml

1apiVersion: monitoring.coreos.com/v1
2kind: PrometheusRule
3metadata:
4  name: my-app-alerts
5  namespace: production
6  labels:
7    release: kube-prometheus-stack
8spec:
9  groups:
10    - name: my-app
11      interval: 30s
12      rules:
13        - alert: HighErrorRate
14          expr: |
15            sum(rate(http_requests_total{namespace="production",status_code=~"5.."}[5m]))
16            /
17            sum(rate(http_requests_total{namespace="production"}[5m]))
18            > 0.01
19          for: 5m
20          labels:
21            severity: warning
22          annotations:
23            summary: "High 5xx error rate on my-app"
24            description: "Error rate is {{ $value | humanizePercentage }} — investigate logs"
25
26        - alert: PodCrashLooping
27          expr: |
28            increase(kube_pod_container_status_restarts_total{namespace="production"}[1h]) > 3
29          for: 0m
30          labels:
31            severity: critical
32          annotations:
33            summary: "Pod {{ $labels.pod }} is crash looping"

bash

kubectl apply -f prometheus-rules.yaml

Check the rule at http://localhost:9090 → Alerts. It should appear in INACTIVE state (no firing yet).

Step 8: Configure Alertmanager

By default, Alertmanager doesn't route alerts anywhere. Configure Slack notifications:

yaml

1apiVersion: v1
2kind: Secret
3metadata:
4  name: alertmanager-kube-prometheus-stack
5  namespace: monitoring
6stringData:
7  alertmanager.yaml: |
8    global:
9      slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
10
11    route:
12      group_by: [alertname, namespace]
13      group_wait: 30s
14      group_interval: 5m
15      repeat_interval: 4h
16      receiver: slack-alerts
17      routes:
18        - match:
19            severity: critical
20          receiver: slack-critical
21
22    receivers:
23      - name: slack-alerts
24        slack_configs:
25          - channel: "#alerts"
26            title: "{{ .CommonAnnotations.summary }}"
27            text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
28
29      - name: slack-critical
30        slack_configs:
31          - channel: "#oncall"
32            title: "CRITICAL: {{ .CommonAnnotations.summary }}"

bash

kubectl apply -f alertmanager-config.yaml
# Restart Alertmanager to pick it up
kubectl rollout restart statefulset/alertmanager-kube-prometheus-stack-alertmanager -n monitoring

Persistent Storage in Production

The storageSpec in Step 1 creates a PersistentVolumeClaim for Prometheus. For Grafana, add:

bash

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.persistence.enabled=true \
  --set grafana.persistence.size=5Gi \
  --reuse-values

Without persistent storage, your dashboards and alert history disappear on pod restart.

Official References

Prometheus Documentation — Official Prometheus docs: data model, PromQL, scrape configuration, and alerting
kube-prometheus-stack Helm Chart — The standard Helm chart for deploying Prometheus, Alertmanager, and Grafana together
Grafana Documentation — Grafana docs: dashboard building, variables, alerts, and data source configuration
Prometheus Operator — How ServiceMonitor and PodMonitor CRDs work for declarative scrape configuration
PromQL Basics — Official PromQL reference covering selectors, functions, and aggregation operators

Monitoring Kubernetes with Prometheus and Grafana

Before you begin

Step 1: Install the kube-prometheus-stack

Step 2: Access Grafana

Step 3: Access Prometheus

Step 4: Instrument Your Application

Step 5: Tell Prometheus to Scrape Your App

Step 6: Build a Grafana Dashboard for Your App

Step 7: Create an Alert Rule

Step 8: Configure Alertmanager

Persistent Storage in Production

Official References

Struggling with this in production?