Alerts¶
tailscale2otel ships ready-to-use alert and recording rules in
deploy/alerts/.
Two delivery models are provided — pick one per rule set; loading both causes double-firing.
Two delivery models¶
| File | Format | Evaluated by |
|---|---|---|
tailscale2otel.grafana-rules.yaml |
Grafana file-provisioning (apiVersion: 1) |
Grafana (can span Prometheus + Loki) |
tailscale2otel.rules.yaml |
Standard Prometheus ruler groups: / rules: |
Prometheus, Mimir, Cortex, or Loki ruler |
Grafana-managed rules (recommended)¶
tailscale2otel.grafana-rules.yaml
is generated by gen/build_rules.py.
Edit the generator, not the YAML, and regenerate with:
Every rule follows the canonical Grafana 3-node pipeline (A query → B reduce → C threshold) so it
round-trips cleanly through the Grafana UI and API. Datasource UIDs default to the portable Grafana
Cloud defaults (grafanacloud-prom / grafanacloud-logs); swap them for a self-hosted stack.
Rules are organised into four groups:
tailscale2otel-health— exporter self-health (scrape staleness, cardinality cap, API auth failures, checkpoint errors, enrichment cache age, and more)tailscale2otel-security— tailnet security and governance (tailnet-lock errors, key expiry, posture coverage, unverified contacts)tailscale2otel-integrations— MDM/EDR posture sync, log-stream delivery healthtailscale2otel-network— DERP relay usage, region latency, flow data presencetailscale2otel-recording— precomputed recording rules (DERP byte fraction, posture ratios, total active series)
Default-disabled by design
Only a high-signal starter set ships with isPaused: false. The rest are isPaused: true —
enable them in the Grafana UI once your tailnet has the relevant data. Optional signals
(posture, log streaming, tailnet-lock, DERP rollups) use noDataState: OK so they don't fire
until data actually exists.
Importing the Grafana-managed file¶
- File provisioning (self-hosted / Alloy): drop the file in
/etc/grafana/provisioning/alerting/and restart Grafana. - Terraform / Grizzly: the file uses the Grafana provisioning model, which both tools consume directly.
- Grafana Cloud UI: the file-provisioning format is not importable via the UI's "Import alert
rules" flow — use the provisioning API or Terraform instead. For the UI path, use
tailscale2otel.rules.yaml(see below).
Datasource-managed baseline (tailscale2otel.rules.yaml)¶
tailscale2otel.rules.yaml
is the hand-maintained Prometheus-format equivalent: standard alert: / expr: / for: rules.
It covers the same core signals — exporter liveness, collector failures, OTLP export errors, device
and auth key expiry, flow-logging state, node-metrics target health, and a webhook-driven IP
forwarding misconfiguration alert.
Prometheus / Mimir / Cortex ruler — add it to your rule_files: or load with mimirtool:
Grafana UI — Alerting → Alert rules → More → Import alert rules from a Prometheus rules file.
Metric naming in rule expressions¶
All expr fields query the normalized Prometheus names produced by Grafana Cloud's OTLP
pipeline, not the raw OTEL names. The same rules apply as in the dashboards: dots become
underscores, counters gain _total, and a gauge with unit "1" becomes *_ratio. See
Metrics for the full translation table.
Non-Grafana backends
If you send metrics to a non-Grafana OTEL backend with different normalization rules, you will need to adjust the metric names in the rule expressions accordingly.
Wiring notifications¶
Both files set a severity label (critical / warning / info) on every rule. Wire that label
into your Grafana notification policy or Alertmanager routing tree to fan alerts to the right
contact points.
Validating locally¶
# PromQL expression check (requires promtool)
promtool check rules deploy/alerts/tailscale2otel.rules.yaml