Architecture¶
tailscale2otel is a single static Go binary that bridges the Tailscale observability surface to
any OpenTelemetry backend. It polls the Tailscale API (and optionally receives streamed logs or
webhooks), runs each data source through a typed conversion pipeline, and pushes the resulting
metrics and logs over OTLP — all without writing a single line of PromQL or touching a sidecar.
High-level data flow¶
flowchart LR
subgraph Sources
A[Tailscale API]
B[Log stream\nSplunk-HEC]
C[Webhook receiver]
end
subgraph Collectors ["internal/collector (one per source)"]
D[devices / users / keys\nsettings / acl / dns\nnodemetrics]
E[flowlogs / auditlogs\nwindow collectors]
end
subgraph Processors
F[flowlog.Processor\naudit.Processor]
G[enrich.DeviceCache]
end
subgraph Telemetry ["internal/telemetry"]
H[telemetry.Emitter facade]
end
I[(OTLP endpoint\nGrafana Cloud)]
A --> D
A --> E
B --> F
C --> F
E --> F
D --> H
F --> G
G --> H
H --> I
Collectors and processors emit only through the telemetry.Emitter interface — they never touch
the OTEL SDK directly. This keeps OTLP a deployment concern, not a business-logic concern.
Composition root — internal/app¶
app.New is where everything gets wired together. On startup it:
- Builds the OTEL telemetry provider (gRPC/HTTP/stdout exporter, cumulative temporality for Grafana Cloud).
- Constructs the Tailscale API client (
internal/tsapi) with either OAuth (preferred — auto-refreshing, no expiry) or a static API key. - Creates the shared
flowlog.Processorandaudit.Processorso both the poll path and the stream/webhook path feed identical conversion logic. - Initialises the in-memory device enrichment cache (
internal/enrich). - Registers every enabled collector with the scheduler (
internal/collector). - Starts any enabled receivers (
internal/streamfor Splunk-HEC,internal/webhook). - Launches the admin HTTP server (health probes, status page, optional pprof).
The file internal/app/collectors.go is the canonical list of what is registered and under what
config gates. Start here when you want to understand how a new data source would be added.
Collector types and scheduling¶
internal/collector defines two interfaces:
SnapshotCollector— a point-in-time read, called on a fixed interval. Used bydevices,users,keys,settings,acl,dns, andnodemetrics. Stateless between ticks (theaclcollector is the exception: it stores an ETag to skip unchanged responses).WindowCollector— a time-windowed read, called with an explicit[from, to]range. Used byflowlogsandauditlogs. Each run returns a high-water mark that becomes thefromof the next window, enabling resumable polling across restarts.
Each collector runs in its own goroutine with a small randomised start-up stagger (bounded at 3 s by default) so no two collectors hit the API at the same instant. A panic or transient error in one tick is recovered and logged; it never stops the scheduler or any other collector.
Poll vs. stream — pick one per log type¶
For flow logs and audit logs there are two ingestion paths:
- Poll (
source: poll): the window collector periodically calls the Tailscale Logs API and processes the batch. - Stream (
source: stream): tailscale2otel runs a built-in Splunk-HEC-compatible receiver; Tailscale pushes records to it in near-real time.
You must choose exactly one path per log type. Running both double-counts records. A best-effort
bounded FIFO de-duplicate set (internal/dedup) can suppress exact duplicates across paths, but it
is a failsafe, not a substitute for correct configuration. The app logs a WARNING at startup if both
paths are active for the same log type. See Streaming & Webhooks for
receiver setup.
Crucially, both paths feed the same flowlog.Processor / audit.Processor, so the emitted
signals are identical regardless of which path delivers the record.
Device enrichment¶
The devices collector populates an in-memory cache (internal/enrich.DeviceCache) that maps IP
addresses and node IDs to human-readable device names. The flow-log and audit processors consult
this cache when annotating log records and metrics with source.address / destination.address
labels.
Degraded enrichment without the devices collector
If the devices collector is disabled, the enrichment cache is empty and flow/audit records
fall back to unknown (for unresolvable internal IPs) or external (for addresses outside the
tailnet). Device-name labels will be absent or generic until the collector is re-enabled and has
run at least once.
Checkpointing¶
Window collectors persist their high-water marks via internal/collector.CheckpointStore. By
default this is a file (checkpoints.json next to the binary), written atomically on each
successful tick. On a clean restart the collector resumes from the last saved mark rather than
re-fetching the full history; on cold start (no checkpoint) it applies the configured
initial_lookback window. In-memory checkpointing is also available (useful for ephemeral
containers where durability is handled externally).
If a window collection fails the high-water mark is not advanced, so the same window is retried on the next tick.
Self-observability and the admin status page¶
tailscale2otel emits its own health signals as OTLP metrics (see Metrics for the full catalog):
tailscale2otel.scrape.*— per-collector duration, success/failure counts, and last-run timestamp, tagged withtailscale.collector.tailscale2otel.up— overall heartbeat gauge.tailscale2otel.series.active— per-source-metric active time-series count (cardinality headroom tracking).tailscale2otel.api.requests/api.retries— Tailscale API call counters.
In addition, an admin HTTP server (default :8080) serves:
/— an HTML status page with live collector health, cardinality table, the metrics/log catalog, discovered node-metrics targets, and a redacted config view./api/status.json— the same data as JSON, for programmatic access./healthzand/readyz— liveness and readiness probes./debug/pprof— optional, requiresprofiling.pprof.enabled: true(which in turn requiresadmin.enabled: true).
The status page is entirely self-contained — no CDN or external assets — so it renders on air-gapped tailnets.
Key package boundaries¶
| Package | Role |
|---|---|
internal/app |
Composition root; wires all components together |
internal/collector/<name> |
One subpackage per data source; fetch + emit |
internal/flowlog, internal/audit |
Shared record types and processors |
internal/enrich |
In-memory device enrichment cache |
internal/telemetry |
Emitter facade; the only code that touches the OTEL SDK |
internal/tsapi |
Tailscale API client (OAuth / API key, retry, rate-limit handling) |
internal/stream, internal/webhook |
Alternate log ingestion receivers |
internal/config |
Layered config loader and validation |
See Configuration for the full config reference and Metrics for every emitted signal.