Skip to content

Workloads

A workload is the request-generating layer of a blueprint. Where constructs emit infrastructure telemetry on a fixed tick, a workload mints correlated request samples per master tick and projects them across traces, logs, and optionally RUM — threading one correlation key-set through every signal class so a trace ID in a span matches the same ID in the application log and the browser beacon.

Two workload kinds exist:

  • web_service — a single service: a browser→backend→DB hop tree with optional gen_ai/LLM hops, RUM, Beyla, and profiling. Simple and common.
  • app — a blueprint-declared service graph of typed nodes, each with its own custom metrics/logs/spans via the telemetry DSL, and one end-to-end correlated trace across the whole graph. Use when you need multiple backend services each with distinct telemetry, per-service incident targeting, or per-service scaling.

The two kinds may coexist in one blueprint: model the core correlated flow as an app, and simpler peripheral services as standalone web_service workloads.

web_service

web_service models a single backend service. One minted request produces:

  • A trace: optional browser CLIENT root span → backend SERVER span → one CLIENT span per downstream call (database or cache hops, with db.* semantic conventions). Traces are backdated so they end at approximately now, matching how real spans export on completion.
  • Application logs: one structured log line per request, correlated by trace/span ID.
  • APM span-metrics: traces_spanmetrics_* histograms + service-graph series on the 60-second metric tick.
  • Optional Faro/RUM beacons when rum: true and GC_FARO_* credentials are present.
  • Optional Beyla eBPF observation lane when observability.beyla is set.
  • Optional Pyroscope SDK-push profiles when pyroscope: is set with mode: sdk.
  • Optional native OTLP application metrics (http.server.*) when otel.metrics: true.

Key fields

workloads:
  - type: web_service
    name: mine-api
    runs_on: mine-prod-use1      # binds to a cluster by its declared name
    tracing: true                # default true; false omits the OTLP lane
    rum: true                    # requires GC_FARO_* creds; omit or false = disabled
    traffic:
      off_peak_rps: 10           # trough request rate (default 5)
      peak_rps: 80               # plateau request rate (default 50)
    endpoints:
      - { route: "GET /v1/search",  error_rate: 0.01, p95_ms: 140 }
      - { route: "POST /v1/items",  error_rate: 0.02, p95_ms: 220 }
    calls:
      - { db: mine-app-db }      # resolved to the named database fixture
      - { cache: mine-sessions } # resolved to the named cache fixture

traffic drives the metric lane volume (span-metrics RPS). The correlation narrative (one request per master tick) is separate and smaller by design — it seeds realistic trace/log samples without inflating span-metric series cardinality.

endpoints are drawn uniformly across each minted request; error_rate and p95_ms shape the per-route latency distribution and error fraction in both spans and span-metrics.

gen_ai / LLM hops

When a blueprint wires an AI infrastructure construct (AgentCore, Bedrock) to the same cluster, web_service emits in-process gen_ai.* span attributes and correlated AI logs on the backend span. No additional YAML is needed on the workload; the gen_ai trace vocabulary comes from internal/genai (the same seam used by app).

app

app declares a multi-service graph where each node is a first-class service with its own telemetry. One request minted at the entry node propagates a single correlated trace across the whole graph: every node adds its own SERVER span (and optional CLIENT spans to its downstream calls), so the resulting trace shows the full request journey across all services.

Service nodes

Each node in services: is a ServiceNode:

Field Description
name Unique graph identity; stamped as service / service_name label on every signal from this node.
type Span semantics. Valid values: frontend / web / grpc / worker / job / stream / gateway / db / cache / llm / agent / tool / workflow / retrieval. Unknown types fall back to a default.
runtime go / jvm / node / python — selects the catalog runtime profile.
entry true on exactly one node: the graph's request entry point.
replicas Pod count for the k8s substrate cascade (default 2).
calls Downstream node names (graph edges).
routes Request routes drawn per-request. On the entry node these populate r.Route; on a callee they name its SERVER span.
profiles Catalog profile template names applied to this node.
metrics / logs / spans Inline custom telemetry via the DSL (the escape hatch).
external true = remote/managed service: appears as a trace hop but is not deployed as a k8s pod on the caller's cluster.
agentic_flow In-process LangGraph orchestration — emits invoke_workflowinvoke_agentexecute_tool*chat span subtree inside this node's SERVER span.
pages RUM navigation inventory (frontend entry nodes only). Page-views are RUM-only: they model session navigation around the traced action and emit no backend trace.

Service graph example

workloads:
  - type: app
    name: mine-app
    runs_on: mine-prod-use1
    traffic:
      off_peak_rps: 5
      peak_rps: 40
      request_latency_p95_ms: 9000   # LLM-call budget; default 200ms suits plain HTTP
    models:
      - { model: gpt-4o,             provider: azure-openai }
      - { model: claude-3-5-sonnet,  provider: bedrock }
    services:
      - name: mine-frontend
        type: frontend
        entry: true
        runtime: node
        replicas: 2
        routes: ["GET /", "GET /search"]
        profiles: [rum_faro, runtime_node]
        calls: [mine-api]

      - name: mine-api
        type: web
        runtime: go
        replicas: 3
        profiles: [scraped_http_server, runtime_go, gen_ai_client]
        calls: [mine-db, mine-gateway]

      - name: mine-gateway
        type: gateway
        runtime: go
        replicas: 2
        profiles: [gateway_export_log]
        calls: [mine-llm]
        external: false

      - name: mine-llm
        type: llm
        external: true              # managed endpoint: trace hop only, no k8s pod
        calls: []

      - name: mine-db
        type: db
        runtime: go
        db_instance: mine-app-db   # links to the RDS fixture for db.* CLIENT span attrs
        calls: []

The resulting trace shape:

graph LR
    browser["browser CLIENT"]
    fe["mine-frontend SERVER"]
    api["mine-api SERVER"]
    gw["mine-gateway SERVER"]
    llm["mine-llm CLIENT (external)"]
    db["mine-db SERVER (db hop)"]

    browser --> fe
    fe --> api
    api --> gw
    api --> db
    gw --> llm

Models and gen_ai hops

The top-level models: list declares valid (model, provider) pairings. The app minter draws one pair per request and stamps it into the correlation — so gen_ai.* span attributes, gateway export logs, and eval log entries all carry the same model and provider for that request. Pairing prevents impossible combinations (e.g. a Claude model on the Azure-OpenAI provider).

An agentic_flow on a node adds a nested gen_ai span subtree inside that node's SERVER span:

- name: mine-api
  type: web
  runtime: go
  agentic_flow:
    workflow: mine-search-workflow
    agents:
      - name: search-agent
        tools: [vector_search, rerank, summarise]
    omit_chat: false    # false = include the chat <model> leaf (the LLM call)

Set omit_chat: true when a connected gateway or llm node already models the LLM call so it is not double-counted.

Telemetry DSL

Each node can declare custom metrics, log streams, and extra span attributes via inline DSL specs. The DSL is the profiles: escape hatch — use catalog profiles first, inline specs for anything not covered.

Value models (exactly one per field):

Kind YAML Description
const const: 42.0 Fixed numeric value
const_str const_str: "ok" Fixed string
enum enum: [{value: "read", weight: 3}, {value: "write", weight: 1}] Weighted categorical draw
int_range int_range: {min: 0, max: 100, p_zero: 0.95} Bounded integer; p_zero forces 0 with given probability
float_range float_range: {min: 0.001, max: 2.5} Bounded float
normal normal: {mean: 50.0, stddev: 10.0} Gaussian draw (negative clamped to 0)
bool bool: {p_true: 0.1} Weighted boolean
shape shape: {base: 100.0, mode: latency_storm} base × shape-engine reading; incident-responsive when mode is set
ref ref: trace_id Pulls a correlation field by name

Capability matrix (enforced at load time):

  • Metric labels and Loki stream labels: only const / const_str / enum. These must enumerate a stable, total domain on every run.
  • Metric values: only numeric models — const, int_range, float_range, normal, bool, shape.
  • Log body fields and span attributes: any model, including ref. High-cardinality correlation keys (trace_id, portkey_trace_id, run_id, etc.) ride here — never as labels or stream labels.

Example inline metric:

- name: mine-api
  type: web
  metrics:
    - name: mine_requests_total
      instrument: counter
      labels:
        route:  { enum: [{value: "/search", weight: 3}, {value: "/items", weight: 1}] }
        status: { enum: [{value: "200", weight: 9}, {value: "500", weight: 1}] }
      value:
        shape: { base: 1.0, mode: throughput_drop }

Catalog profiles

The catalog ships reusable profile templates that any node can apply by name in profiles::

Profile Emits
scraped_http_server http_server_request_duration_seconds histogram (classic buckets)
runtime_go Go runtime metrics: go_goroutines, go_memstats_heap_inuse_bytes, process_resident_memory_bytes, process_cpu_seconds_total
runtime_jvm JVM metrics: process_cpu_seconds_total and GC/heap families
runtime_node Node.js runtime metrics: process_cpu_seconds_total
runtime_python Python runtime: python_gc_objects_collected_total, process_resident_memory_bytes, process_cpu_seconds_total
gen_ai_client gen_ai.* client span attributes + correlated AI request log
gateway_export_log Portkey gateway export log stream (LLM cost/token/latency fields)
gateway_native_scrape Portkey gateway scraped metrics: request_count, latency histograms, processing-time families
eval_log LangSmith evaluation result log stream
bedrock_invocation_log AWS Bedrock model invocation log
rum_faro Faro/RUM browser beacon emission (entry frontend nodes only)

Failure modes

Workloads register failure modes that incidents can target. See Incidents for how to activate them.

web_service modes (axis: workload):

Mode Effect
latency_spike Elevated request latency (up to 4× at full intensity)
error_burst Elevated 5xx error rate
cpu_hotspot CPU concentrated in a hot frame (profiling flamegraph)
memory_leak Growing heap (profile sample values rise)
lock_contention Elevated mutex/block contention (profile values)
goroutine_leak Goroutine accumulation (profile values)

app service-node modes (axis: service — each node is individually addressable):

Mode Effect
latency_storm Elevated latency on the targeted node
error_spike Elevated 5xx rate on the targeted node
throughput_drop Reduced throughput on the targeted node
fallback_storm Elevated gateway fallback rate on the targeted node
retry_storm Elevated gateway retry rate on the targeted node
cpu_hotspot Hot frame amplification on the targeted node (profiling)
memory_leak Growing heap on the targeted node (profiling)
lock_contention Mutex/block contention on the targeted node (profiling)
goroutine_leak Goroutine accumulation on the targeted node (profiling)
web_vitals_degraded Browser web-vitals degrade on the targeted frontend node — LCP/INP/TTFB/FCP/CLS spike