Skip to main content

On This Page

The 429 That Poisoned Every Fallback: AI Agent Reliability Risks

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The 429 That Poisoned Every Fallback

OpenClaw issue #62672 reveals a critical bug where error states from primary providers like GPT-5.4 are incorrectly propagated to fallback models. This state inheritance causes healthy secondary providers like DeepSeek to be placed in cooldown without ever being called.

Why This Matters

Architectural assumptions that treat AI providers as interchangeable nodes in a single pipeline often ignore the necessity of independent failure domains. When error objects and hashes are shared across evaluation contexts, a single rate limit from a primary provider can poison the entire chain, rendering fallback redundancy useless. Technical reliability requires that every candidate in a fallback sequence operates within a fresh, isolated evaluation context to prevent cross-provider state leakage.

Key Insights

  • Issue #62672 (2026) shows GPT-5.4 429 errors poisoning DeepSeek attempts via inherited error hashes.
  • Error objects crossing provider boundaries prevent independent failure domains from functioning correctly.
  • Issue #55941 highlights auth cooldowns scoped per-profile instead of per-(profile, model).
  • Issue #62119 documents the candidate_succeeded flag incorrectly triggering on 404 errors.
  • Hash-based deduplication is identified as a dangerous pattern when applied across independent service domains.

Working Examples

Visualizing the error propagation and state poisoning in the OpenClaw fallback chain.

Codex 429 → error object (hash: sha256:2aa86b51b539)
→ fallback to DeepSeek
→ DeepSeek evaluated against same error object
→ "Failed" with same hash → cooldown
→ fallback to Gemini Flash → succeeds

Practical Applications

  • AI Agent Architecture: Ensure each fallback candidate uses a fresh request with unique credentials and an isolated evaluation context.
  • System Monitoring: Test the second and third providers in a chain explicitly, as success in the third may mask failures in the second.
  • Error Handling: Avoid hash-based deduplication across different API domains to prevent state leakage between providers.

References:

Continue reading

Next article

The $5.4 Billion IoT Architecture Flaw: Lessons from the July 19 CrowdStrike Outage

Related Content