Workloads¶
A workload is the request-generating layer of a blueprint. Where constructs emit infrastructure telemetry on a fixed tick, a workload mints correlated request samples per master tick and projects them across traces, logs, and optionally RUM — threading one correlation key-set through every signal class so a trace ID in a span matches the same ID in the application log and the browser beacon.
Two workload kinds exist:
web_service— a single service: a browser→backend→DB hop tree with optional gen_ai/LLM hops, RUM, Beyla, and profiling. Simple and common.app— a blueprint-declared service graph of typed nodes, each with its own custom metrics/logs/spans via the telemetry DSL, and one end-to-end correlated trace across the whole graph. Use when you need multiple backend services each with distinct telemetry, per-service incident targeting, or per-service scaling.
The two kinds may coexist in one blueprint: model the core correlated flow as an app, and simpler peripheral services as standalone web_service workloads.
web_service¶
web_service models a single backend service. One minted request produces:
- A trace: optional browser CLIENT root span → backend SERVER span → one CLIENT span per downstream call (database or cache hops, with
db.*semantic conventions). Traces are backdated so they end at approximatelynow, matching how real spans export on completion. - Application logs: one structured log line per request, correlated by trace/span ID.
- APM span-metrics:
traces_spanmetrics_*histograms + service-graph series on the 60-second metric tick. - Optional Faro/RUM beacons when
rum: trueandGC_FARO_*credentials are present. - Optional Beyla eBPF observation lane when
observability.beylais set. - Optional Pyroscope SDK-push profiles when
pyroscope:is set withmode: sdk. - Optional native OTLP application metrics (
http.server.*) whenotel.metrics: true.
Key fields¶
workloads:
- type: web_service
name: mine-api
runs_on: mine-prod-use1 # binds to a cluster by its declared name
tracing: true # default true; false omits the OTLP lane
rum: true # requires GC_FARO_* creds; omit or false = disabled
traffic:
off_peak_rps: 10 # trough request rate (default 5)
peak_rps: 80 # plateau request rate (default 50)
endpoints:
- { route: "GET /v1/search", error_rate: 0.01, p95_ms: 140 }
- { route: "POST /v1/items", error_rate: 0.02, p95_ms: 220 }
calls:
- { db: mine-app-db } # resolved to the named database fixture
- { cache: mine-sessions } # resolved to the named cache fixture
traffic drives the metric lane volume (span-metrics RPS). The correlation narrative (one request per master tick) is separate and smaller by design — it seeds realistic trace/log samples without inflating span-metric series cardinality.
endpoints are drawn uniformly across each minted request; error_rate and p95_ms shape the per-route latency distribution and error fraction in both spans and span-metrics.
gen_ai / LLM hops¶
When a blueprint wires an AI infrastructure construct (AgentCore, Bedrock) to the same cluster, web_service emits in-process gen_ai.* span attributes and correlated AI logs on the backend span. No additional YAML is needed on the workload; the gen_ai trace vocabulary comes from internal/genai (the same seam used by app).
app¶
app declares a multi-service graph where each node is a first-class service with its own telemetry. One request minted at the entry node propagates a single correlated trace across the whole graph: every node adds its own SERVER span (and optional CLIENT spans to its downstream calls), so the resulting trace shows the full request journey across all services.
Service nodes¶
Each node in services: is a ServiceNode:
| Field | Description |
|---|---|
name |
Unique graph identity; stamped as service / service_name label on every signal from this node. |
type |
Span semantics. Valid values: frontend / web / grpc / worker / job / stream / gateway / db / cache / llm / agent / tool / workflow / retrieval. Unknown types fall back to a default. |
runtime |
go / jvm / node / python — selects the catalog runtime profile. |
entry |
true on exactly one node: the graph's request entry point. |
replicas |
Pod count for the k8s substrate cascade (default 2). |
calls |
Downstream node names (graph edges). |
routes |
Request routes drawn per-request. On the entry node these populate r.Route; on a callee they name its SERVER span. |
profiles |
Catalog profile template names applied to this node. |
metrics / logs / spans |
Inline custom telemetry via the DSL (the escape hatch). |
external |
true = remote/managed service: appears as a trace hop but is not deployed as a k8s pod on the caller's cluster. |
agentic_flow |
In-process LangGraph orchestration — emits invoke_workflow → invoke_agent → execute_tool* → chat span subtree inside this node's SERVER span. |
pages |
RUM navigation inventory (frontend entry nodes only). Page-views are RUM-only: they model session navigation around the traced action and emit no backend trace. |
Service graph example¶
workloads:
- type: app
name: mine-app
runs_on: mine-prod-use1
traffic:
off_peak_rps: 5
peak_rps: 40
request_latency_p95_ms: 9000 # LLM-call budget; default 200ms suits plain HTTP
models:
- { model: gpt-4o, provider: azure-openai }
- { model: claude-3-5-sonnet, provider: bedrock }
services:
- name: mine-frontend
type: frontend
entry: true
runtime: node
replicas: 2
routes: ["GET /", "GET /search"]
profiles: [rum_faro, runtime_node]
calls: [mine-api]
- name: mine-api
type: web
runtime: go
replicas: 3
profiles: [scraped_http_server, runtime_go, gen_ai_client]
calls: [mine-db, mine-gateway]
- name: mine-gateway
type: gateway
runtime: go
replicas: 2
profiles: [gateway_export_log]
calls: [mine-llm]
external: false
- name: mine-llm
type: llm
external: true # managed endpoint: trace hop only, no k8s pod
calls: []
- name: mine-db
type: db
runtime: go
db_instance: mine-app-db # links to the RDS fixture for db.* CLIENT span attrs
calls: []
The resulting trace shape:
graph LR
browser["browser CLIENT"]
fe["mine-frontend SERVER"]
api["mine-api SERVER"]
gw["mine-gateway SERVER"]
llm["mine-llm CLIENT (external)"]
db["mine-db SERVER (db hop)"]
browser --> fe
fe --> api
api --> gw
api --> db
gw --> llm
Models and gen_ai hops¶
The top-level models: list declares valid (model, provider) pairings. The app minter draws one pair per request and stamps it into the correlation — so gen_ai.* span attributes, gateway export logs, and eval log entries all carry the same model and provider for that request. Pairing prevents impossible combinations (e.g. a Claude model on the Azure-OpenAI provider).
An agentic_flow on a node adds a nested gen_ai span subtree inside that node's SERVER span:
- name: mine-api
type: web
runtime: go
agentic_flow:
workflow: mine-search-workflow
agents:
- name: search-agent
tools: [vector_search, rerank, summarise]
omit_chat: false # false = include the chat <model> leaf (the LLM call)
Set omit_chat: true when a connected gateway or llm node already models the LLM call so it is not double-counted.
Telemetry DSL¶
Each node can declare custom metrics, log streams, and extra span attributes via inline DSL specs. The DSL is the profiles: escape hatch — use catalog profiles first, inline specs for anything not covered.
Value models (exactly one per field):
| Kind | YAML | Description |
|---|---|---|
const |
const: 42.0 |
Fixed numeric value |
const_str |
const_str: "ok" |
Fixed string |
enum |
enum: [{value: "read", weight: 3}, {value: "write", weight: 1}] |
Weighted categorical draw |
int_range |
int_range: {min: 0, max: 100, p_zero: 0.95} |
Bounded integer; p_zero forces 0 with given probability |
float_range |
float_range: {min: 0.001, max: 2.5} |
Bounded float |
normal |
normal: {mean: 50.0, stddev: 10.0} |
Gaussian draw (negative clamped to 0) |
bool |
bool: {p_true: 0.1} |
Weighted boolean |
shape |
shape: {base: 100.0, mode: latency_storm} |
base × shape-engine reading; incident-responsive when mode is set |
ref |
ref: trace_id |
Pulls a correlation field by name |
Capability matrix (enforced at load time):
- Metric labels and Loki stream labels: only
const/const_str/enum. These must enumerate a stable, total domain on every run. - Metric values: only numeric models —
const,int_range,float_range,normal,bool,shape. - Log body fields and span attributes: any model, including
ref. High-cardinality correlation keys (trace_id,portkey_trace_id,run_id, etc.) ride here — never as labels or stream labels.
Example inline metric:
- name: mine-api
type: web
metrics:
- name: mine_requests_total
instrument: counter
labels:
route: { enum: [{value: "/search", weight: 3}, {value: "/items", weight: 1}] }
status: { enum: [{value: "200", weight: 9}, {value: "500", weight: 1}] }
value:
shape: { base: 1.0, mode: throughput_drop }
Catalog profiles¶
The catalog ships reusable profile templates that any node can apply by name in profiles::
| Profile | Emits |
|---|---|
scraped_http_server |
http_server_request_duration_seconds histogram (classic buckets) |
runtime_go |
Go runtime metrics: go_goroutines, go_memstats_heap_inuse_bytes, process_resident_memory_bytes, process_cpu_seconds_total |
runtime_jvm |
JVM metrics: process_cpu_seconds_total and GC/heap families |
runtime_node |
Node.js runtime metrics: process_cpu_seconds_total |
runtime_python |
Python runtime: python_gc_objects_collected_total, process_resident_memory_bytes, process_cpu_seconds_total |
gen_ai_client |
gen_ai.* client span attributes + correlated AI request log |
gateway_export_log |
Portkey gateway export log stream (LLM cost/token/latency fields) |
gateway_native_scrape |
Portkey gateway scraped metrics: request_count, latency histograms, processing-time families |
eval_log |
LangSmith evaluation result log stream |
bedrock_invocation_log |
AWS Bedrock model invocation log |
rum_faro |
Faro/RUM browser beacon emission (entry frontend nodes only) |
Failure modes¶
Workloads register failure modes that incidents can target. See Incidents for how to activate them.
web_service modes (axis: workload):
| Mode | Effect |
|---|---|
latency_spike |
Elevated request latency (up to 4× at full intensity) |
error_burst |
Elevated 5xx error rate |
cpu_hotspot |
CPU concentrated in a hot frame (profiling flamegraph) |
memory_leak |
Growing heap (profile sample values rise) |
lock_contention |
Elevated mutex/block contention (profile values) |
goroutine_leak |
Goroutine accumulation (profile values) |
app service-node modes (axis: service — each node is individually addressable):
| Mode | Effect |
|---|---|
latency_storm |
Elevated latency on the targeted node |
error_spike |
Elevated 5xx rate on the targeted node |
throughput_drop |
Reduced throughput on the targeted node |
fallback_storm |
Elevated gateway fallback rate on the targeted node |
retry_storm |
Elevated gateway retry rate on the targeted node |
cpu_hotspot |
Hot frame amplification on the targeted node (profiling) |
memory_leak |
Growing heap on the targeted node (profiling) |
lock_contention |
Mutex/block contention on the targeted node (profiling) |
goroutine_leak |
Goroutine accumulation on the targeted node (profiling) |
web_vitals_degraded |
Browser web-vitals degrade on the targeted frontend node — LCP/INP/TTFB/FCP/CLS spike |