Skip to content

Blueprint Schema Reference

This reference is generated from the live Go types via make blueprint-schema and reflects the schema enforced at load time. Decoding is strict: any key not listed here causes a loud load error. If you hit an unexpected load failure, check that your key spellings match exactly.

Regenerating the schema

Run make blueprint-schema to regenerate BLUEPRINT-SCHEMA.md from the source types. The TestSchemaCurrent gate fails if the committed file drifts from the live types.


Top-level blueprint document

Key Type Required Description
name string yes Unique blueprint identifier; also the determinism seed root.
label string yes Selector value stamped on every blueprint-scoped series. Defaults to name.
metadata object no Human-facing annotation. UI-only; never affects emission.
metadata.description string no Free-text summary shown in the UI.
metadata.tags[] string no Free-form labels for filtering/grouping.
metadata.owner string no Owning team or person.
metadata.links map[string]string no Named external references (name → url).
metadata.category string no Single classification (e.g. demo/reference/customer).
shape string no Default shape profile for the whole blueprint.
timezone string no Business-hours anchor. Default Europe/Zurich. Mutually exclusive with regions.
regions[] object no Follow-the-sun multi-timezone composite. Mutually exclusive with timezone.
regions[].name string Region identifier.
regions[].timezone string IANA timezone string.
regions[].weight float Relative traffic weight for this region.
series_budget int no Per-blueprint series cap.
environments[] object yes One entry per deployment environment.
workloads[] object no Application workloads (request traffic).
features map[string]config no Grafana Cloud product declarations (synthetic_monitoring, fleet_management).
integrations map[string]config no External-source declarations (cloudflare, csp_azure, csp_gcp, etc.).
incidents[] object no Scheduled or interval-recurring failure activations.
scenarios[] object no Named, reusable failure bundles composed of effects.
hosts[] object no Traditional non-Kubernetes machines.

environments[]

Key Type Required Description
environments[].name string yes Environment name (e.g. prod, staging).
environments[].weight float no Relative traffic volume. Default 1.0.
environments[].production bool no Keeps weekend traffic at full level. Default: true when name == "prod".
environments[].metadata object no Human-facing annotation (same sub-keys as top-level metadata).
environments[].cloud object no AWS cloud account and service config.
environments[].cluster object no Kubernetes cluster config.
environments[].databases[] object no Database instances.
environments[].caches[] object no Cache instances.

environments[].cloud

Key Type Required Description
cloud.provider string yes "aws" (v1).
cloud.account_id string yes AWS account ID (string to preserve leading zeros).
cloud.region string yes AWS region (e.g. us-east-1).
cloud.vpc_id string yes VPC identifier.
cloud.nat_gateways int no Number of NAT Gateway instances to emit aws_natgateway_* for.
cloud.cloudwatch object no cw_infra sub-family toggles. See cw_infra config below.
cloud.aoss object no Amazon OpenSearch Serverless config. Absent ⇒ not emitted. See aoss config.
cloud.mwaa object no Amazon Managed Workflows for Apache Airflow config. Absent ⇒ not emitted. See mwaa config.
cloud.glue object no AWS Glue ETL config. Absent ⇒ not emitted. See glue config.
cloud.bedrock object no AWS Bedrock CloudWatch config. Absent ⇒ not emitted. See bedrock config.
cloud.agentcore object no AWS Bedrock AgentCore CloudWatch config. Absent ⇒ not emitted. See agentcore config.

environments[].cluster

Key Type Required Description
cluster.type string yes "eks" (v1).
cluster.name string yes Cluster name. Must be globally unique across all enabled blueprints.
cluster.node_groups[] object no Node group definitions.
cluster.node_groups[].name string Node group name.
cluster.node_groups[].instance_type string EC2 instance type (e.g. m6i.xlarge).
cluster.node_groups[].desired int Desired node count.
cluster.node_groups[].provisioner string "managed" (default) or "karpenter".
cluster.node_groups[].os string linux (default) or windows.
cluster.k8s_monitoring object no Grafana k8s-monitoring Helm chart config.
cluster.k8s_monitoring.enabled bool Enable the k8s-monitoring substrate.
cluster.k8s_monitoring.chart_version string Helm chart version string.
cluster.k8s_monitoring.alloy bool Deploy alloy-metrics StatefulSet.
cluster.k8s_monitoring.alloy_version string Alloy version (e.g. "1.16.3"; canonicalized to "v1.16.3").
cluster.k8s_monitoring.opencost bool Enable OpenCost cost-allocation.
cluster.k8s_monitoring.kepler bool Enable Kepler energy monitoring.
cluster.k8s_monitoring.features map[string]bool Feature gates for Alloy collectors. Valid keys: cluster_metrics, cluster_events, pod_logs, node_logs, profiling, application_observability. Absent/false ⇒ that collector is not deployed.
cluster.k8s_monitoring.metrics_replicas int alloy-metrics StatefulSet replica count. 0 ⇒ default 1.
cluster.k8s_monitoring.receiver_as_daemonset bool Model alloy-receiver as a per-node DaemonSet instead of a Deployment.
cluster.k8s_monitoring.fleet_management bool Register this cluster's Alloy collectors with the Fleet Management API. Requires enabled: true.
cluster.k8s_monitoring.control_plane object Gate individual control-plane metric families.
cluster.k8s_monitoring.control_plane.api_server bool Emit kube-apiserver metrics.
cluster.k8s_monitoring.control_plane.kube_proxy bool Emit kube-proxy metrics.
cluster.k8s_monitoring.control_plane.kube_scheduler bool Emit kube-scheduler metrics.
cluster.k8s_monitoring.control_plane.kube_controller_manager bool Emit kube-controller-manager metrics.
cluster.k8s_monitoring.control_plane.kubelet_probes bool Emit kubelet probe metrics.
cluster.k8s_monitoring.pod_logs_method string Pod-log collection mechanism. Empty with pod_logs: true defaults to "opentelemetry". Absent pod_logs"none".
cluster.observability object no Gates the per-node EC2 CloudWatch lane.
cluster.observability.cloudwatch bool no Emit the aws_ec2_* CloudWatch lane for cluster nodes. Default true.
cluster.addons[] object no Cluster add-ons to deploy. May be bare scalar (- core_dns) or map form (- {name: cluster_autoscaler, min_nodes: 3, max_nodes: 10}).
cluster.addons[].name string Add-on construct kind (registry key).
cluster.addons[].<kind config> varies Add-on construct's own config fields. See per-kind sections below.
cluster.platform object no Node OS and Kubernetes version.
cluster.platform.os string "al2", "al2023" (default), or "bottlerocket".
cluster.platform.kubernetes_version string e.g. "1.31" (default).
cluster.platform.kernel_version string Optional override of the node kernel string.

environments[].databases[]

Key Type Required Description
databases[].engine string yes "postgres", "mysql", "docdb", or "neptune".
databases[].version string yes Engine version string.
databases[].name string yes Instance name. Must be globally unique across all enabled blueprints.
databases[].instance_class string no e.g. "db.t3.medium". Empty ⇒ resolver default.
databases[].observability object no Emission switch. Nil ⇒ CloudWatch only.
databases[].observability.cloudwatch bool no Emit aws_rds_* CloudWatch family. Default true.
databases[].observability.dbo11y bool no Emit database_observability_* lane. Default false.
databases[].observability.digests int no dbo11y query-catalogue size. Default 40. Ignored unless dbo11y: true.

environments[].caches[]

Key Type Required Description
caches[].engine string yes "redis".
caches[].version string yes Engine version string.
caches[].name string yes Instance name. Must be globally unique across all enabled blueprints.
caches[].instance_class string no e.g. "cache.r6g.large". Empty ⇒ resolver default.
caches[].observability object no Emission switch.
caches[].observability.cloudwatch bool no Emit the aws_elasticache_* CloudWatch lane. Default true.

workloads[]

Key Type Required Description
workloads[].type string yes Workload kind registry key (e.g. web_service, app).
workloads[].name string yes Unique workload instance name.
workloads[].runs_on string yes Cluster name this workload runs on. Resolved at load or fails.
workloads[].replicas int no Pod count (default 2).
workloads[].calls[] object no Downstream DB/cache hops.
workloads[].calls[].db string Name of a declared database this workload calls.
workloads[].calls[].cache string Name of a declared cache this workload calls.
workloads[].<kind config> varies Workload kind's own config. See web_service config and app config below.
for_each_env bool no Fan this workload into one instance per environment.
envs[] string no Subset of environment names to fan into (used with for_each_env: true).

web_service workload config

Key Type Required Description
tracing bool no Enable the OTLP trace lane. Default true.
rum bool no Enable the Faro/RUM lane (requires GC_FARO_* credentials).
traffic object no Traffic shaping for the metric lane.
traffic.shape string Shape profile name (informational).
traffic.off_peak_rps float Trough request rate. Default 5.
traffic.peak_rps float Plateau request rate. Default 50.
endpoints[] object no Route catalogue; requests are drawn uniformly.
endpoints[].route string e.g. "GET /v1/items".
endpoints[].error_rate float [0,1] base error fraction.
endpoints[].p95_ms float p95 latency target in ms.
observability object no Additive emission switch for the Beyla eBPF observation lane.
observability.beyla object Beyla lane config.
observability.beyla.mode string "kubernetes" (default) or "standalone".
observability.beyla.context string "ebpf_only" (default) or "coexist_sdk".
observability.beyla.features[] string Empty ⇒ Beyla default features for the mode.
pyroscope object no Continuous profiling lane config.
pyroscope.enabled bool Enable profiling emission.
pyroscope.mode string Push mode.
pyroscope.runtime string Runtime (e.g. go, jvm).
pyroscope.types[] string Profile type names.
pyroscope.span_profiles bool Enable span-correlated profiles.
otel object no Native OTLP application-metrics lane. Absent or metrics: false ⇒ no OTLP metrics.
otel.metrics bool Enable native OTLP http.server.* metrics emission.
otel.mode string "naked" (default) or "k8s_monitoring".
context string no §5 resource-attr context (Platform, ContentGen, DataGen). Empty ⇒ omitted.
use_case string no §5 resource-attr use case. Empty ⇒ omitted.
team string no §5 resource-attr team. Empty ⇒ omitted.
version string no Override the default service.version. Empty ⇒ default.

app workload config

The app workload declares a multi-service graph. See workloads.md for a conceptual guide.

Key Type Required Description
services[] object yes Graph nodes.
services[].name string yes Unique graph identity; stamped as the service / service_name label.
services[].type string no Span semantics + profile fit (e.g. frontend, web, grpc, db, llm, agent).
services[].runtime string no go, jvm, node, or python. Selects the runtime profile.
services[].entry bool no Marks the request entry point (mints invocations).
services[].namespace string no Kubernetes namespace. Propagates to both substrate placement and emitted telemetry.
services[].context string no §5 resource-attr context. Empty ⇒ omitted.
services[].use_case string no §5 resource-attr use case. Empty ⇒ omitted.
services[].team string no §5 resource-attr team. Empty ⇒ omitted.
services[].version string no Override service.version.
services[].routes[] string no Request routes (e.g. "GET /v1/items").
services[].replicas int no Pod count for this node (default 2).
services[].profiles[] string no Catalog profile-template names applied to this node.
services[].calls[] string no Downstream node names (graph edges).
services[].db_instance string no Base database name to resolve per-env (e.g. "orders-pg""orders-pg-<env>").
services[].external bool no Remote/managed service: appears as a trace hop but is NOT placed as a k8s pod.
services[].pages[] object no RUM navigation inventory (frontend entry nodes only).
services[].pages[].path string SPA route (e.g. /document-library).
services[].pages[].name string Human view name.
services[].pages[].actions[] string User-action intents on this page.
services[].agentic_flow object no In-process LangGraph orchestration that adds a gen_ai span subtree. Nil ⇒ no agentic flow.
services[].agentic_flow.workflow string LangGraph graph name.
services[].agentic_flow.agents[] object Pool of agents the workflow can invoke.
services[].agentic_flow.agents[].name string Agent name.
services[].agentic_flow.agents[].tools[] string Tool names for this agent.
services[].agentic_flow.omit_chat bool Drop the chat <model> leaf span (set when a connected gateway hop already models the LLM call).
services[].pyroscope object no Per-node profiling config. Same sub-keys as web_service.pyroscope.
services[].resources object no Container CPU/memory requests/limits and cAdvisor usage base. Affects only the k8s substrate lane.
services[].resources.cpu_request float CPU request (cores).
services[].resources.cpu_limit float CPU limit (cores).
services[].resources.mem_request float Memory request (bytes).
services[].resources.mem_limit float Memory limit (bytes).
services[].resources.cpu_usage_base float cAdvisor CPU usage base (cores).
services[].controller string no k8s controller kind: Deployment (default) or StatefulSet.
services[].hpa bool no Emit kube_horizontalpodautoscaler_* metrics.
services[].volume_claims[] string no PVC template names for kube_persistentvolumeclaim_* and kubelet volume stats.
services[].metrics[] object no Inline custom metric definitions (DSL).
services[].logs[] object no Inline custom log stream definitions (DSL).
services[].spans[] object no Inline custom span attribute definitions (DSL).
traffic object no Entry node invocation volume shaping.
traffic.shape string Shape profile name.
traffic.off_peak_rps float Trough rate. Default 5.
traffic.peak_rps float Plateau rate. Default 50.
traffic.request_latency_p95_ms float Base end-to-end latency p95. Default 0200ms. LLM/agentic apps should set this to seconds (e.g. 9000).
models[] object no Valid (model, provider) routing pairs for AI apps. Empty ⇒ non-AI app.
models[].model string gen_ai.request.model value (e.g. gpt-4o, claude-3.5-sonnet).
models[].provider string gen_ai.provider.name value (e.g. azure-openai, bedrock).

Custom-telemetry DSL (app workload metrics[], logs[], spans[])

The DSL value model is a one-of; exactly one of these keys must be present per value spec:

Generator key Type Description
const float Fixed numeric constant.
const_str string Fixed string constant.
enum[] {value, weight} Weighted random pick from a domain. Required for metric and stream labels.
int_range {min, max, p_zero} Uniform integer range with optional zero-probability.
float_range {min, max} Uniform float range.
normal {mean, stddev} Normal distribution.
bool {p_true} Bernoulli with given probability.
shape {base, mode} Incident-responsive value driven by the shape engine.
ref string Correlation field reference (high-card). Metric/stream labels reject this.

features

features is a map keyed by feature kind. Each entry accepts enabled: bool (default true) plus kind-specific config.

synthetic_monitoring config

Key Type Description
checks[] object One entry per synthetic check.
checks[].name string Required. Becomes the Prometheus job label.
checks[].target string HTTP URL probed. Default: https://<name>.example.com/health.
checks[].frequency int Poll interval in milliseconds. Default 60000.
checks[].probe string Private probe name. Default "synthkit-private".
checks[].region string Probe region. Default "EMEA".
checks[].labels map[string]string User-defined labels emitted as label_<k>=<v> on every probe series.

fleet_management config

Key Type Description
collectors_per_os map[string]int OS name (linux/windows/darwin) → desired fake collector count. Absent OS ⇒ not emitted.

integrations

integrations is a map keyed by integration kind. Each entry accepts enabled: bool (default true) plus kind-specific config. The for_each_env and envs keys are also accepted to fan an integration across environments.

cloudflare config

Key Type Description
zone string Cloudflare zone name.
account string Account identifier.
colocations[] string Colocation names.
tunnels[] object Tunnel entries.
tunnels[].name string Tunnel name.

csp_azure config

Key Type Description
subscriptions int Number of synthetic Azure subscriptions (default 2).
company string Company slug for subscription names (default "demo").
sub_signals[] string Families to emit. Valid: compute, databases, storage, networking, messaging, logs. ai is opt-in and NOT in the default set — must be listed explicitly. Empty ⇒ all default families.
ingestion_path string "serverless" (default, GC managed scraper) or "azure_exporter".
credential string Managed-scraper credential name (serverless path only). Default "azure".
tags map[string]string Resource tags emitted as tag_<key> on every series. Opt-in: omit ⇒ no tag labels.

csp_gcp config

Key Type Description
projects int Number of synthetic GCP projects (default 2).
company string Company slug for project IDs (default "demo").
sub_signals[] string Families to emit. Valid: compute, databases, storage, networking, loadbalancing, pubsub, cloudrun, bigtable, logs. vertex is opt-in — must be listed explicitly. Empty ⇒ all default families.

portkey_gateway config

Key Type Description
models[] string LLM model names to spread across label values (default ["gpt-4o"]).
providers[] string Provider names (default ["azure-openai"]).
app string Service name for the app= label (default "ai-gateway").
env string Environment name for the env= label (default "prod").
sub_signals[] string Valid: "gateway" (14 portkey_* metrics), "runtime" (node_* subset). Empty ⇒ both.

portkey_poller config

Key Type Description
workspace string Portkey workspace identifier (default "ws-demo").
use_cases[] string metadata_use_case dimension values (default ["assistant","summarization"]).
models[] string ai_model dimension values (default ["gpt-4o","gpt-4o-mini","gpt-4.1-mini"]).
use_case_weights map[string]float Per-use-case volume multiplier. Missing key ⇒ 1.0.

langsmith_eval config

Key Type Description
projects[] string LangSmith project names (default ["assistant-prod"]).
use_cases[] string Use-case dimension values (default ["assistant","summarization"]).
evaluators[] string Evaluator keys for langsmith_eval_score (default ["faithfulness","completeness","relevance"]).
use_case_weights map[string]float Per-use-case volume multiplier. Missing key ⇒ 1.0.

langsmith_platform config

Key Type Description
services[] string Service names to emit metrics for. Valid: backend, host-backend, platform-backend, playground, clickhouse, redis, postgres, nginx. Empty ⇒ all eight.

snowflake config

Key Type Description
account string Snowflake account identifier (default "demo-acct").
warehouses[] string Virtual warehouse names (default ["wh_compute","wh_etl"]).
databases[] string Database names for per-database metrics (default ["analytics","raw"]).

network_topology config

Key Type Description
instance string Exporter scrape endpoint (e.g. "netobs-dc1:9100"). Required; must be unique across blueprints.
job string Prometheus job label (default "integrations/network-topology-exporter").
role string Federation role: standalone (default), hub, or spoke.
spoke_id string Spoke identity (required when role: spoke).
protocols[] string Discovery walker protocols (default [lldp, bgp]).
fabric object Optional topology generator config.
fabric.kind string spine_leaf, clos, linear, or star.
fabric.spines int Spine count (spine_leaf/clos).
fabric.leaves int Leaf count (spine_leaf/clos).
fabric.hosts_per_leaf int Optional access hosts per leaf.
fabric.vendor_mix[] string Round-robin vendor assignment (default [arista]).
fabric.site string Device site label.
devices[] object Explicit device declarations (augment/override generated fabric).
devices[].id string Device identifier.
devices[].vendor string Vendor name.
devices[].os_version string OS/firmware version.
devices[].site string Site label.
devices[].uptime int Initial uptime (seconds).
links[] object Explicit link declarations.
links[].src_device string Source device ID.
links[].src_port string Source port.
links[].dst_device string Destination device ID.
links[].dst_port string Destination port.
links[].proto string Protocol.
links[].link_kind string Link kind.
session_pool bool Gate the snmp_session_pool_* family.
out_of_scope_neighbours int Steady-state out-of-scope neighbour count.
otlp_output bool Gate the otlp_push_total family.
federation object Hub-mode wiring.
federation.spokes[] string Spoke instance names aggregated by this hub.

beyla_agent config

Key Type Description
mode string "kubernetes" (default) or "standalone".
instrumented_processes int eBPF-instrumented process count (default 4).
version string Beyla version for build-info gauge (default "1.9.0").
revision string Beyla git revision (default "unknown").
cluster string Cluster name (kubernetes mode identity).
node string Node name (kubernetes mode identity).
host string Host name (standalone mode identity).

qualification_pipeline config

Key Type Description
stages[] string Pipeline stage names. Default: ["verification","build","test","test-tokens-usage","autovalidate","pdf"].
jobs[] string CI job names. Default: ["validation-sbom","iac-tests","functional-tests"].
suites[] string Test suite names (coined qualification_* metrics). Default: ["infra","functional"].
clouds[] string Cloud target names. Default: ["aws","azure","gcp","common"].

Cluster add-on configs

Add-ons are listed under cluster.addons[]. Most have no configurable fields; only those with config are shown.

cw_infra config (cloud.cloudwatch)

Key Type Description
albs int ALB instance count. Nil/omitted ⇒ default 1. Explicit 0 disables the ALB family.
s3_buckets int S3 bucket count. Nil/omitted ⇒ default 2. Explicit 0 disables the S3 family.
firehose bool Emit aws_firehose_* (default true).
nlb bool Emit aws_networkelb_* (default true).
ebs bool Emit aws_ebs_* (default true).
nat_gateway bool Emit aws_natgateway_* (default true).
eks bool Emit aws_eks_* control-plane (default true).
private_link bool Emit aws_privatelink_* endpoints and services (default true).

bedrock config

Key Type Description
models[] string Model IDs to emit per-model series for. Empty ⇒ default model list.
sub_signals[] string Families to emit. Valid: models, agents, guardrails, invocation_logs. Empty ⇒ all four.

agentcore config

Key Type Description
agents[] string Agent logical names for resource-usage dimensions. Default: ["planner","retriever"].
sub_signals[] string Families to emit. Valid: runtime, resource_usage, usage_logs. Empty ⇒ runtime + resource_usage + app logs. usage_logs is opt-in.

aoss config

Key Type Description
collections[] string OpenSearch Serverless collection names. Empty ⇒ one synthetic collection.

mwaa config

Key Type Description
environments[] string MWAA environment names. Empty ⇒ one synthetic environment.

glue config

Key Type Description
jobs[] string AWS Glue job names. Empty ⇒ one synthetic job.

cluster_autoscaler config

Key Type Description
min_nodes int Minimum node count for autoscaler metrics.
max_nodes int Maximum node count for autoscaler metrics.

cert_manager config

Key Type Description
job_mode string "" or "autodiscovery"job="cert-manager". "integration"job="integrations/cert-manager".

ksm_ingress config

Key Type Description
ingresses[] object Ingress declarations.
ingresses[].name string Ingress name.
ingresses[].namespace string Namespace (default: first workload's namespace).
ingresses[].host string Hostname (default: <name>.example.com).
ingresses[].path string Path (default: "/").
ingresses[].service_name string Required. Service name.
ingresses[].service_port int Service port (default 80).
ingresses[].tls bool TLS enabled (default false).

Add-ons with no configurable fields: alloy_health, argocd, core_dns, ebs_csi, envoy_gateway, etcd, external_dns, karpenter, load_balancer_controller, vpc_cni.


incidents[]

Key Type Description
incidents[].kind string Failure mode name. Mutually exclusive with scenario.
incidents[].scenario string Scenario name to fire (fires the whole bundle). Mutually exclusive with kind/target.
incidents[].target string Instance name to target. "" ⇒ blueprint-wide (single-axis modes only).
incidents[].at string RFC3339 or "2006-01-02T15:04[:05]" (blueprint timezone). One-shot activation.
incidents[].every string Go duration (e.g. "10m"). Makes the incident interval-recurring. Mutually exclusive with at.
incidents[].for string Go duration (e.g. "20m"). Active window per every cycle (or one-shot duration).
incidents[].intensity float [0,1] effect intensity.

scenarios[]

Key Type Description
scenarios[].name string Scenario identifier used in incidents[].scenario and control-plane API calls.
scenarios[].title string Human display name. Defaults to name.
scenarios[].summary string One-line description of what the scenario causes.
scenarios[].effects[] object Effect list.
scenarios[].effects[].mode string Failure mode name.
scenarios[].effects[].target string Instance name, <axis>:* wildcard, or empty (single-axis modes only).
scenarios[].effects[].intensity float [0,1] effect intensity. Default 1.0.

hosts[]

Key Type Description
hosts[].name string Hostname. Required, unique. Becomes the instance label.
hosts[].os string Exporter vocabulary: linux (default), windows, or macos.
hosts[].ip string Optional private IP address. Omitted when empty.
hosts[].cpus int Logical CPU count. Default 2.
hosts[].memory_gb int Total RAM in GiB. Default 8.
hosts[].metrics_profile string "integration" (GC integration allowlist, default) or "full" (broad default-Alloy surface).
hosts[].os_version string OS version string (e.g. "22.04", "Server 2022"). Optional.
hosts[].kernel string Kernel string for node_uname_info (linux/macos). Optional.
hosts[].observability object Gates the Docker cadvisor lane and host logs.
hosts[].observability.docker bool Emit Docker cadvisor metrics and container log streams. Default false.
hosts[].observability.logs bool Emit host log streams (journal/winevent/file). Default true.

Failure modes

The table below lists every valid mode value for incidents[].kind or scenarios[].effects[].mode.

Mode Axis Description
agentcore_throttle cloud AgentCore request throttles + system_errors spike (region-scoped capacity constraint).
bedrock_throttle cloud Bedrock invocation throttling climbs.
connection_saturation database Active connections climb toward max.
cpu_hotspot service / workload / cluster Elevated CPU concentrated in a hot frame.
error_burst workload Elevated 5xx error rate.
error_spike service Elevated 5xx error rate on the targeted service node.
eval_quality_degraded cloud LangSmith eval quality regresses — scores drop, retry/HITL rates climb.
fallback_storm service Elevated gateway fallback rate on the targeted service node.
goroutine_leak service / workload Goroutine accumulation.
latency_spike workload Elevated request latency (up to 4× at full intensity).
latency_storm service Elevated request latency on the targeted service node.
lock_contention service / workload / database Elevated mutex/block contention or database lock waits.
memory_leak workload / service Growing heap — raises memory inuse/alloc profile values.
nettopo_auth_failures network SNMP credential trials fail.
nettopo_devices_unreachable network SNMP polling fails for a fraction of devices.
nettopo_discovery_slow network Discovery cycle duration inflates.
nettopo_spoke_down network A federation spoke goes offline.
nettopo_walker_degraded network Walker outcome errors climb; edge count under-reports.
node_not_ready cluster A node flips NotReady; its pods go Pending.
oom_kill cluster Containers OOM-killed; restart count climbs, status reason OOMKilled.
pod_crashloop cluster Pods crash-looping; restarts climb, phase Pending.
portkey_scrape_degraded cloud Portkey Analytics scrape degrades — error rate, latency, and poller lag climb.
replication_lag database Replica falls behind primary.
retry_storm service Elevated gateway retry rate on the targeted service node.
slow_query_storm database Query latency right-tail spikes.
throughput_drop service Reduced throughput on the targeted service node.
web_vitals_degraded service Browser web-vitals degrade (LCP/INP/TTFB/FCP and CLS spike).