Metrics Reference¶
sf2loki emits its own operational metrics via OTLP — there is no
Prometheus /metrics scrape endpoint, the connector is OTLP-native and
pushes on an interval. This page is the canonical reference for every
instrument, generated by reading src/sf2loki/obs/metrics.py.
All instruments carry an org label when multi-org ingestion is enabled
(for_org() stamps it on per-org components — token provider, Salesforce
clients, sources, the limits poller); single-org deployments omit it. Names
below are exactly as created in code — unsuffixed. See
Metric-name suffixes before wiring dashboards or
alerts against them.
Enabling metrics¶
Set service.telemetry.enabled: true and service.telemetry.endpoint:
service:
telemetry:
enabled: true
endpoint: https://otlp-gateway-<zone>.grafana.net/otlp/v1/metrics
# endpoint: http://alloy:4318/v1/metrics # local Alloy instead of Grafana Cloud
- Grafana Cloud — basic auth defaults to the Loki sink's
tenant_id/auth_token(one stack credential covers both Loki and OTLP). - Local/in-cluster Alloy — set
service.telemetry.auth: nonefor an unauthenticated collector. - Enable Salesforce org-limit gauges (API usage, storage, streaming events)
with
salesforce.limits.enabled: true.
See Configuration for the full service.telemetry field
reference.
Metric-name suffixes¶
Keep add_metric_suffixes on, or dashboards/alerts go blank
Instruments are created unsuffixed in code (e.g. sf2loki_events_ingested),
but the OpenTelemetry→Prometheus bridge appends _total (counters),
_bucket/_count/_sum (histograms) on export — e.g.
sf2loki_events_ingested_total, sf2loki_ingest_lag_seconds_bucket.
Grafana Cloud's OTLP endpoint adds these suffixes by default. If you route
metrics through your own OpenTelemetry Collector or Grafana Alloy instead,
keep add_metric_suffixes (a.k.a. AddMetricSuffixes) enabled on the
Prometheus exporter — with it off, the
connector-health dashboard and every
connector metric alert query the suffixed names and return
nothing.
Gauges are queried by their bare name (no suffix) in both cases.
Instruments¶
| Metric name | Type | Labels | Meaning |
|---|---|---|---|
sf2loki_events_ingested |
Counter | (none) | Total events ingested from Salesforce sources |
sf2loki_decode_errors |
Counter | (none) | Total Avro/payload decode errors |
sf2loki_loki_push |
Counter | (none) | Total Loki push attempts |
sf2loki_loki_entries_dropped |
Counter | (none) | Loki entries dropped as undeliverable (permanent errors), per reason |
sf2loki_loki_push_duration_seconds |
Histogram | (none) | Duration of Loki push requests in seconds |
sf2loki_loki_bytes_pushed |
Counter | (none) | Total bytes pushed to Loki |
sf2loki_last_push_success_timestamp_seconds |
Gauge | (none) | Unix ts of the last successful Loki push |
sf2loki_ingest_lag_seconds |
Histogram | per event type | now - event EventDate (the ingest-lag SLI). Bucket boundaries extend to 24h (1, 5, 15, 60, 300, 900, 1800, 3600, 7200, 14400, 28800, 86400) because EventLogFile lag is legitimately 3-6h |
sf2loki_last_replay_commit_timestamp_seconds |
Gauge | per topic | Unix ts of the last committed replay_id |
sf2loki_pubsub_pending_credits |
Gauge | per topic | Pending Pub/Sub flow-control credits |
sf2loki_pubsub_reconnects |
Counter | per topic | Total Pub/Sub stream reconnects |
sf2loki_pubsub_replay_fallbacks |
Counter | per topic | Subscriptions restarted with a fallback replay preset after an invalid/expired replay id |
sf2loki_pubsub_stream_stalls |
Counter | per topic | Pub/Sub streams force-reconnected by the keepalive watchdog |
sf2loki_salesforce_api_throttled |
Counter | per api | Salesforce REST calls rejected with REQUEST_LIMIT_EXCEEDED |
sf2loki_pubsub_stream_up |
Gauge | per topic | 1 while a topic's subscribe stream is connected and healthy, 0 while erroring/reconnecting |
sf2loki_watermark_timestamp_seconds |
Gauge | per source/object | Unix ts of the current polling watermark |
sf2loki_watermark_stalls |
Counter | per source/object | Poll cycles that made no progress because a full page shared one timestamp boundary (a compound-cursor stall would silently drop newer data) |
sf2loki_pubsub_decode_stalls |
Counter | per topic | Topics stuck making zero progress — consecutive batches at 100% decode failure, or repeated schema-fetch failures |
sf2loki_checkpoint_load_errors |
Counter | per source | Stored checkpoint values that failed to load/decode and fell back to a replay preset / lookback |
sf2loki_auth_refreshes |
Counter | (none) | Total Salesforce OAuth token refreshes |
sf2loki_auth_errors |
Counter | (none) | Total Salesforce auth errors |
sf2loki_schema_cache_size |
Gauge | (none) | Avro schemas in the codec cache |
sf2loki_queue_depth |
Gauge | (none) | Depth of the internal event queue |
sf2loki_eventlogfile_files_processed |
Counter | per event type | EventLogFile records downloaded and parsed |
sf2loki_eventlogfile_rows_ingested |
Counter | per event type | CSV rows ingested from EventLogFiles |
sf2loki_eventlogfile_download_bytes |
Counter | (none) | Bytes downloaded from EventLogFile LogFile endpoints |
sf2loki_eventlogfile_download_errors |
Counter | per reason | EventLogFile listing/download errors |
sf2loki_soql_poll_errors |
Counter | per source and object/event type | Failed SOQL poll cycles |
sf2loki_apexlog_logs_ingested |
Counter | (none) | ApexLog debug logs ingested |
sf2loki_apexlog_download_bytes |
Counter | (none) | Bytes downloaded from ApexLog Body endpoints |
sf2loki_apexlog_bodies_skipped |
Counter | per reason | ApexLog bodies not shipped (over max_body_bytes or download error) |
sf2loki_apexlog_download_errors |
Counter | per reason | ApexLog body download errors |
sf2loki_timestamp_fallbacks |
Counter | per source | Entries whose event timestamp was missing/unparseable and a fallback was used |
sf2loki_lines_truncated |
Counter | per source | Log lines truncated to max_line_bytes |
sf2loki_entries_sampled_out |
Counter | per source and event type | Rows/events dropped by deterministic per-type sampling |
sf2loki_rows_filtered |
Counter | per source and rule | Rows dropped by transform drop_row rules |
sf2loki_egress_budget_used_bytes |
Gauge | (none) | Pre-compression bytes counted against the daily egress budget (current UTC day) |
sf2loki_egress_paused |
Gauge | (none) | 1 while pushes are paused by the daily byte budget, else 0 |
sf2loki_eventlogfile_cycle_seconds |
Gauge | (none) | Wall-clock duration of the last EventLogFile poll cycle |
sf2loki_leader |
Gauge | (none) | 1 while this instance holds leadership (or runs standalone), else 0 |
sf2loki_salesforce_limit_max |
Gauge | per limit_name | Salesforce org limit maximum (requires salesforce.limits.enabled: true) |
sf2loki_salesforce_limit_remaining |
Gauge | per limit_name | Salesforce org limit remaining (requires salesforce.limits.enabled: true) |
sf2loki_salesforce_limits_poll_errors |
Counter | (none) | Total Salesforce /limits poll errors |
sf2loki_build_info |
Gauge | version |
Build metadata; value is always 1 |
All labels above are in addition to the org label added under multi-org
ingestion (see above); "(none)" means no labels other than a possible org.
Where these are used¶
- Dashboards — the bundled
sf2loki-connector-healthdashboard graphs these metrics (suffixed) from a Prometheus datasource. - Alerts — the Grafana-managed alert pack thresholds several of these (ingest lag, Loki push failure rate, org-limit headroom).
- State & checkpoints —
sf2loki_watermark_timestamp_seconds,sf2loki_last_replay_commit_timestamp_seconds, andsf2loki_checkpoint_load_errorsare the checkpoint-health signals. - High availability —
sf2loki_leaderreflects lease ownership under the active-passive coordinator.