High Availability¶
sf2loki runs standalone by default: a single instance is always the leader and
streams continuously (coordinate.type: noop). For hands-off failover, run two
replicas in an active-passive pair — exactly one replica (the leader) ingests at a
time, and the other stands by and takes over if the leader dies.
Why single-instance, and why active-passive rather than scale-out¶
The Salesforce Pub/Sub API has no consumer-group semantics: two subscribers on the same topic each get delivered every event independently, so a second active replica doesn't share load, it double-delivers. sf2loki cannot horizontally scale by simply running more replicas — HA here means one hot spare, not more throughput. Run exactly one active replica at a time (a stop-then-start rollout, never overlapping) unless a coordinator is arbitrating leadership between two replicas for you.
Coordinators¶
Three coordinate.type implementations, selectable in config, with zero changes
required in sources/sinks/state:
noop — single instance (default)¶
Always leader. No lease, no shared storage, no failover. This is what you get if
coordinate is unset.
file_lease — shared-file lease¶
A small JSON document — {holder, expires_at, epoch} — on storage shared by both
replicas (NFS/EFS/a shared volume). The leader renews expires_at every
renew_interval; the standby polls the lease and takes over once it's gone stale.
coordinate:
type: file_lease
file_lease:
path: /var/lib/sf2loki/leader.lease # on shared storage, same for both replicas
ttl: 30s # failover time: standby takes over this long after renewals stop
renew_interval: 10s # must be < ttl/2, so one missed renewal never costs leadership
holder_id: "" # blank -> hostname-pid; set explicitly if hostnames aren't unique
- Expiry is wall-clock, compared across hosts — replicas must be NTP-synced, and
ttlneeds headroom above worst-case clock skew. - Deliberately not
flock(unreliable over NFS, doesn't survive a holder that dies without releasing). Takeover uses an atomic tmp-then-rename write plus a brief pause-and-re-read after a contested rename, so two standbys racing an expired lease never both win. - The shared checkpoint store re-reads the lease fresh at commit time (bypassing its own cache) — see Fencing below.
k8s_lease — Kubernetes coordination.k8s.io/v1 Lease¶
No shared volume needed; requires the sf2loki[k8s] extra.
coordinate:
type: k8s_lease
k8s_lease:
namespace: sf2loki # namespace holding the Lease (default: default)
name: sf2loki-leader # Lease object name, shared by all replicas
identity: "" # blank -> pod name ($HOSTNAME); set if pod names aren't unique
lease_duration: 30s # failover time: standby takes over this long after renewals stop
renew_interval: 10s # must be < lease_duration/2
# kubeconfig: ~/.kube/config # omit for in-cluster config; set for out-of-cluster dev
- Optimistic concurrency uses the Lease's
resourceVersion: a lost update returns HTTP 409, which is the race signal, so — unlikefile_lease— there's no pause-then-verify step needed. - No NTP concern. Staleness is judged by each replica's own observedTime
(the same pattern as client-go's
leaderelection): a replica tracks, on its own monotonic clock, how long it's been since it last saw the Lease'sresourceVersionchange, and only takes over oncelease_durationhas elapsed on that clock. The leader'srenewTime(wall-clock) is written for observability/kubectl describebut never read back for staleness math, so cross-host clock skew can't cause a premature or delayed takeover. - The pod's ServiceAccount needs
get/create/updateonleasesin the coordinator's namespace — see Kubernetes for the full RBAC + Deployment example.
Shared state, regardless of coordinator¶
Both replicas run the same config, and the checkpoint state store must also be shared so the standby resumes exactly where the leader left off:
file_lease— mount the same NFS/EFS export (or shared volume) on both hosts and usestate.store: filepointed at it.k8s_lease— usestate.store: s3orgcs(no shared volume available/needed on Kubernetes).
See State & Checkpoints for backend detail.
Fencing¶
A stale leader — one that lost the lease mid-commit (e.g. a GC pause) — must not be
allowed to advance checkpoints and race the new leader. Each coordinator wires its
fence check into the state store via an optional set_fence(...) hook: a commit is
rejected with StateFenceError if the writer is no longer holding the lease at commit
time. This is not data loss — the batch already landed in Loki, so the cost is at most
a bounded re-ingest after the new leader resumes. Semantics throughout are
at-least-once: expect duplicate log lines around a takeover (Loki's per-stream
reject window dedupes exact duplicates), never gaps.
For the file_lease coordinator specifically, the shared-file checkpoint store
re-reads the lease fresh at commit time (bypassing its own cache) and compares an
epoch fence token that increments on every winning acquire — this catches a stale
leader even before its own local "am I still the leader?" flag has caught up (which
only refreshes once per renew_interval). The S3/GCS state stores don't need the epoch
mechanism: their own ETag/generation-preconditioned compare-and-swap already rejects a
losing writer independently.
Observability¶
sf2loki_leader is 1 on the active leader (and on any standalone instance), 0 on
the standby — sum(sf2loki_leader) across replicas should always be exactly 1; a
sustained 0 (leaderless gap) or 2 (split-brain) means the coordinator has failed to
converge. See Metrics for the metric and
Alerts for the shipped alert pack — there is currently no
dedicated alert on sf2loki_leader cardinality in that pack, so alerting on it is a
gap to close yourself if you need it.
/readyz on the standby is 503 forever, by design
That's correct for a load balancer / target-group check (routes traffic only to the
leader) but it is never safe as a liveness/instance-restart check on an HA
replica — a restart-on-/readyz-failure policy restart-loops the standby forever
and defeats failover. See Deployment and
Kubernetes for the concrete
readiness-vs-liveness split on each platform.
See also¶
- Kubernetes — the runnable manifest example for
k8s_lease. - State & Checkpoints — checkpoint backends and inspection/repair.
- Configuration reference —
CoordinateConfig/FileLeaseConfig/K8sLeaseConfigfield details. - Architecture — the
Coordinatorseam in the broader design.