Essential Observability: 3 Critical Alerts for LLM Systems
These articles are AI-generated summaries. Please check the original sources for full details.
The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow
LLM systems can rack up four-figure model spend in 90 seconds from a single runaway conversation. Gabriel Anhaia details three specific alerts focusing on cost, quality, and retrieval that catch failures before users churn.
Why This Matters
Real-world LLM systems fail silently; a retriever might return 200 OK while providing irrelevant context that forces a model to synthesize nonsense. Without per-conversation cost tracking, an agent loop can execute hundreds of cheap calls that aggregate into a massive financial hit, often discovered hours too late by finance teams rather than engineering on-call responders.
Key Insights
- OpenTelemetry GenAI semantic conventions were significantly revised by March 2026, renaming gen_ai.usage.prompt_tokens to input_tokens.
- Per-conversation cost monitoring catches runaway agent loops that per-call thresholds miss by using cumulative rolling 5-minute windows.
- Judge-score drift detection compares 1-day averages against 7-day baselines to identify subtle prompt regressions or provider model shifts.
- Retrieval-relevance alerts use a relevance_score to detect index drift, preventing scenarios where models produce fluent but factually incorrect answers.
Working Examples
Python emitter for GenAI spans with cost calculation and conversation tracking.
from opentelemetry import trace\nfrom opentelemetry.trace import Status, StatusCode\ntracer = trace.get_tracer("app.llm")\nCOSTS = {"gpt-4o-2024-11-20": (0.0025, 0.0100), "gpt-4o-mini": (0.00015, 0.00060)}\ndef usd(model: str, in_tok: int, out_tok: int) -> float:\n cin, cout = COSTS.get(model, (0.0, 0.0))\n return (in_tok / 1000) * cin + (out_tok / 1000) * cout\ndef emit_llm_span(model, provider, usage, conv_id):\n with tracer.start_as_current_span("gen_ai.chat") as span:\n span.set_attribute("gen_ai.request.model", model)\n span.set_attribute("gen_ai.usage.input_tokens", usage["in"])\n span.set_attribute("gen_ai.usage.output_tokens", usage["out"])\n span.set_attribute("gen_ai.conversation.id", conv_id)\n span.set_attribute("app.llm.cost_usd", usd(model, usage["in"], usage["out"]))
Prometheus alert for any single conversation exceeding $25 in a rolling 5-minute window.
sum by (gen_ai_conversation_id) (rate(app_llm_cost_usd_sum[5m]) * 300) > 25
Practical Applications
- Use Case: Conversation Kill Switch. Systems using gen_ai.conversation.id can automatically terminate runaway loops exceeding $25 spend. Pitfall: Alerting on per-tenant cost instead of per-conversation, which masks high-velocity individual failures.
- Use Case: Model Version Management. Tracking app.llm.judge.score helps detect regressions when providers rotate model aliases. Pitfall: Using global averages that smooth over critical regressions affecting only specific tenants.
- Use Case: RAG Index Validation. Monitoring app.rag.relevance_score detects index drift where re-indexing produces worse chunks. Pitfall: Skipping relevance alerts because latency and status codes appear normal.
References:
Continue reading
Next article
Why Scoped Access is Critical for AI Agents: The Railway Incident Analysis
Related Content
Why Observability Matters for AI Applications: A Deep Dive into LLM Monitoring
Sally O'Malley explains the unique observability challenges of Large Language Models (LLMs) and demonstrates how to implement an open-source observability stack using vLLM, Llama Stack, Prometheus, Grafana, and OpenTelemetry. She discusses key metrics for monitoring performance, cost, and quality, and the importance of tracing for debugging AI workloads.
Beyond the Green Dot: Advanced LLM Observability Lessons from OpenAI Outages
OpenAI's status page lagged 90 minutes during the April 2026 outage; instrumenting five key signals like TTFT and token throughput is essential for reliable AI infrastructure.
OpenTelemetry Standardizes Cloud Observability Across Distributed Systems
OpenTelemetry establishes a unified standard for metrics, logs, and traces, eliminating vendor lock-in for complex distributed cloud environments.