Skip to main content

On This Page

ACMI Protocol v1.2: Solving AI Fleet Coordination with Shared Memory

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

ACMI Protocol v1.2: How We Built a Self-Organizing AI Fleet That Learns From Its Mistakes

Michael Shaw, founder of Mad EZ Media, developed the Agentic Context Management Infrastructure (ACMI) after a coordination failure involving five agents on April 24, 2026. The system uses Upstash Redis to create a shared substrate for agents to communicate via append-only timelines. On its first full day of operation, the fleet delivered 15 deliverables with zero communication drift.

Why This Matters

Individual AI agents suffer from a lack of persistent memory and shared context, which leads to race conditions where multiple agents duplicate work or overwrite registry keys. Most multi-agent systems rely on basic chat channels that fail at scale; ACMI addresses this by implementing a structured protocol for profiles, signals, and timelines to ensure every agent understands the fleet’s total state. This technical infrastructure allows for the efficient use of heterogeneous models, utilizing cheap models like Gemini Flash for routine wakes and expensive T4 models only for complex orchestration, significantly reducing operational costs.

Key Insights

  • The Comms Pattern v1.1 mandates five fields—timestamp, source, kind, correlationId, and summary—to ensure all agent actions are traceable via an automated drift-diff checker.
  • Lock-Protocol v1.0 prevents race conditions by requiring agents to post a ‘coord-claim’ event before batch tasks, which auto-expires after 5 minutes to avoid deadlocks.
  • The fleet utilizes a reinforcement learning cycle (Execute-Assess-Log-Analyze-Adjust) integrated into the AcmiWorkflowManager.mjs to automate quality scoring and future task refinement.
  • Semantic memory is implemented using ChromaDB and OpenAI embeddings, enabling agents to retrieve historical solutions for technical issues like Redis ZSET performance degradation.
  • System hygiene is maintained by anti-dead.mjs, which reaps trackers for any agent that has failed to post an event within a 48-hour window.

Working Examples

The three-part Redis structure used by ACMI to maintain durable identity, live status, and event history for every agent.

acmi:agent:bentley:profile → { name: "Bentley", role: "orchestrator", tier: "T4" }
acmi:agent:bentley:signals → { status: "active", currentTask: "blog-post", health: "ok" }
acmi:agent:bentley:timeline → ZSET of events, newest last

Practical Applications

  • Use Case: Mad EZ Media uses a staggered wake schedule where Gemini-cli, Claude-engineer, and Antigravity agents process specialized tasks hourly through a central coordination thread.
  • Pitfall: Using snake_case or inconsistent field naming in event logs, which prevents automated auditing tools from tracing task handoffs between agents.
  • Use Case: Implementing a human-in-the-loop (HITL) queue via a Kanban dashboard to route critical brand or legal decisions that score below automated quality thresholds.
  • Pitfall: Allowing agents to run without escalation logic, leading to silent failures when an agent stops posting heartbeats while holding pending tasks.

References:

Continue reading

Next article

Why 'AI Wrote It' is the New Excuse for Engineering Accountability Failures

Related Content