Skip to content

Metrics Reference

sf2loki emits its own operational metrics via OTLP — there is no Prometheus /metrics scrape endpoint, the connector is OTLP-native and pushes on an interval. This page is the canonical reference for every instrument, generated by reading src/sf2loki/obs/metrics.py.

All instruments carry an org label when multi-org ingestion is enabled (for_org() stamps it on per-org components — token provider, Salesforce clients, sources, the limits poller); single-org deployments omit it. Names below are exactly as created in code — unsuffixed. See Metric-name suffixes before wiring dashboards or alerts against them.

Enabling metrics

Set service.telemetry.enabled: true and service.telemetry.endpoint:

service:
  telemetry:
    enabled: true
    endpoint: https://otlp-gateway-<zone>.grafana.net/otlp/v1/metrics
    # endpoint: http://alloy:4318/v1/metrics   # local Alloy instead of Grafana Cloud
  • Grafana Cloud — basic auth defaults to the Loki sink's tenant_id / auth_token (one stack credential covers both Loki and OTLP).
  • Local/in-cluster Alloy — set service.telemetry.auth: none for an unauthenticated collector.
  • Enable Salesforce org-limit gauges (API usage, storage, streaming events) with salesforce.limits.enabled: true.

See Configuration for the full service.telemetry field reference.

Metric-name suffixes

Keep add_metric_suffixes on, or dashboards/alerts go blank

Instruments are created unsuffixed in code (e.g. sf2loki_events_ingested), but the OpenTelemetry→Prometheus bridge appends _total (counters), _bucket/_count/_sum (histograms) on export — e.g. sf2loki_events_ingested_total, sf2loki_ingest_lag_seconds_bucket. Grafana Cloud's OTLP endpoint adds these suffixes by default. If you route metrics through your own OpenTelemetry Collector or Grafana Alloy instead, keep add_metric_suffixes (a.k.a. AddMetricSuffixes) enabled on the Prometheus exporter — with it off, the connector-health dashboard and every connector metric alert query the suffixed names and return nothing.

Gauges are queried by their bare name (no suffix) in both cases.

Instruments

Metric name Type Labels Meaning
sf2loki_events_ingested Counter (none) Total events ingested from Salesforce sources
sf2loki_decode_errors Counter (none) Total Avro/payload decode errors
sf2loki_loki_push Counter (none) Total Loki push attempts
sf2loki_loki_entries_dropped Counter (none) Loki entries dropped as undeliverable (permanent errors), per reason
sf2loki_loki_push_duration_seconds Histogram (none) Duration of Loki push requests in seconds
sf2loki_loki_bytes_pushed Counter (none) Total bytes pushed to Loki
sf2loki_last_push_success_timestamp_seconds Gauge (none) Unix ts of the last successful Loki push
sf2loki_ingest_lag_seconds Histogram per event type now - event EventDate (the ingest-lag SLI). Bucket boundaries extend to 24h (1, 5, 15, 60, 300, 900, 1800, 3600, 7200, 14400, 28800, 86400) because EventLogFile lag is legitimately 3-6h
sf2loki_last_replay_commit_timestamp_seconds Gauge per topic Unix ts of the last committed replay_id
sf2loki_pubsub_pending_credits Gauge per topic Pending Pub/Sub flow-control credits
sf2loki_pubsub_reconnects Counter per topic Total Pub/Sub stream reconnects
sf2loki_pubsub_replay_fallbacks Counter per topic Subscriptions restarted with a fallback replay preset after an invalid/expired replay id
sf2loki_pubsub_stream_stalls Counter per topic Pub/Sub streams force-reconnected by the keepalive watchdog
sf2loki_salesforce_api_throttled Counter per api Salesforce REST calls rejected with REQUEST_LIMIT_EXCEEDED
sf2loki_pubsub_stream_up Gauge per topic 1 while a topic's subscribe stream is connected and healthy, 0 while erroring/reconnecting
sf2loki_watermark_timestamp_seconds Gauge per source/object Unix ts of the current polling watermark
sf2loki_watermark_stalls Counter per source/object Poll cycles that made no progress because a full page shared one timestamp boundary (a compound-cursor stall would silently drop newer data)
sf2loki_pubsub_decode_stalls Counter per topic Topics stuck making zero progress — consecutive batches at 100% decode failure, or repeated schema-fetch failures
sf2loki_checkpoint_load_errors Counter per source Stored checkpoint values that failed to load/decode and fell back to a replay preset / lookback
sf2loki_auth_refreshes Counter (none) Total Salesforce OAuth token refreshes
sf2loki_auth_errors Counter (none) Total Salesforce auth errors
sf2loki_schema_cache_size Gauge (none) Avro schemas in the codec cache
sf2loki_queue_depth Gauge (none) Depth of the internal event queue
sf2loki_eventlogfile_files_processed Counter per event type EventLogFile records downloaded and parsed
sf2loki_eventlogfile_rows_ingested Counter per event type CSV rows ingested from EventLogFiles
sf2loki_eventlogfile_download_bytes Counter (none) Bytes downloaded from EventLogFile LogFile endpoints
sf2loki_eventlogfile_download_errors Counter per reason EventLogFile listing/download errors
sf2loki_soql_poll_errors Counter per source and object/event type Failed SOQL poll cycles
sf2loki_apexlog_logs_ingested Counter (none) ApexLog debug logs ingested
sf2loki_apexlog_download_bytes Counter (none) Bytes downloaded from ApexLog Body endpoints
sf2loki_apexlog_bodies_skipped Counter per reason ApexLog bodies not shipped (over max_body_bytes or download error)
sf2loki_apexlog_download_errors Counter per reason ApexLog body download errors
sf2loki_timestamp_fallbacks Counter per source Entries whose event timestamp was missing/unparseable and a fallback was used
sf2loki_lines_truncated Counter per source Log lines truncated to max_line_bytes
sf2loki_entries_sampled_out Counter per source and event type Rows/events dropped by deterministic per-type sampling
sf2loki_rows_filtered Counter per source and rule Rows dropped by transform drop_row rules
sf2loki_egress_budget_used_bytes Gauge (none) Pre-compression bytes counted against the daily egress budget (current UTC day)
sf2loki_egress_paused Gauge (none) 1 while pushes are paused by the daily byte budget, else 0
sf2loki_eventlogfile_cycle_seconds Gauge (none) Wall-clock duration of the last EventLogFile poll cycle
sf2loki_leader Gauge (none) 1 while this instance holds leadership (or runs standalone), else 0
sf2loki_salesforce_limit_max Gauge per limit_name Salesforce org limit maximum (requires salesforce.limits.enabled: true)
sf2loki_salesforce_limit_remaining Gauge per limit_name Salesforce org limit remaining (requires salesforce.limits.enabled: true)
sf2loki_salesforce_limits_poll_errors Counter (none) Total Salesforce /limits poll errors
sf2loki_build_info Gauge version Build metadata; value is always 1

All labels above are in addition to the org label added under multi-org ingestion (see above); "(none)" means no labels other than a possible org.

Where these are used

  • Dashboards — the bundled sf2loki-connector-health dashboard graphs these metrics (suffixed) from a Prometheus datasource.
  • Alerts — the Grafana-managed alert pack thresholds several of these (ingest lag, Loki push failure rate, org-limit headroom).
  • State & checkpointssf2loki_watermark_timestamp_seconds, sf2loki_last_replay_commit_timestamp_seconds, and sf2loki_checkpoint_load_errors are the checkpoint-health signals.
  • High availabilitysf2loki_leader reflects lease ownership under the active-passive coordinator.