How to use
- Pick a group name aligned with your service or team.
- Write a PromQL expression that captures the failure or SLO breach.
- Set a duration to avoid noisy, short-lived spikes.
- Add labels like severity, team, and service for routing.
- Add annotations for human-readable context and runbooks.
- Copy the YAML into your Prometheus rule files and reload.
Production tips
- Use error rate or saturation signals instead of raw errors.
- Include service and environment labels for routing.
- Set different severities for warning vs. critical.
- Link annotations to dashboards and runbooks.
- Validate queries in Prometheus before enabling alerts.
Example alert workflow
Save the generated YAML into a rule file (e.g., alerts.yaml), add it to your Prometheus config under rule_files, and reload Prometheus. Configure Alertmanager routes to map severity or team labels to notification channels.
Prometheus config snippet
Alertmanager routing snippet