Securing Autonomous AI Agents: A Three-Tiered Defense Architecture for Untrusted Code

The Three-Tiered Defense Architecture

The Hermes Agent framework (v0.13) implements a multi-layered defense system to manage autonomous AI tool execution. It prevents high-risk failures, such as an LLM hallucinating a destructive ‘rm -rf /’ command that could wipe a host system in fractions of a second.

Why This Matters

Traditional software tools are static libraries managed by humans, but autonomous agents treat tools as interfaces to external state machines where every call is a mutation of state. Without architectural ‘control rods’ like sandboxing and guardrails, the feedback loop between perception, cognition, and action can lead to infinite, wallet-draining loops or total system collapse.

Key Insights

Hermes Agent v0.13 utilizes a three-layer security stack: Tool Definition (JSON validation), Tool Execution (dispatching), and Sandboxing (containment).
Temporal Sandboxing uses filesystem checkpointing to allow systems to roll back to the last known good state after a destructive tool call failure.
The ToolCallGuardrailController acts as a stateful observer that halts execution when an agent repeatedly calls the same tool with identical arguments and errors.
Iteration budget refunds are applied specifically when only ‘execute_code’ is used, treating programmatic tasks as cheap RPC-style calls rather than expensive terminal processes.

Working Examples

Implementation of a persistent agent integrating SessionDB and AIAgent for durable state tracking.

import asyncio
import json
import logging
import os
import sys
import time
from pathlib import Path
from typing import Dict, List, Optional, Any
# Import the core Hermes Agent classes
from hermes_state import SessionDB
from run_agent import AIAgent, IterationBudget
# Import tool definitions and helpers
from model_tools import (
get_tool_definitions,
get_toolset_for_tool,
handle_function_call,
check_toolset_requirements,
)
# Import memory and skills support
from tools.memory_tool import MemoryStore
from tools.todo_tool import TodoStore
# Import configuration helpers
from hermes_cli.config import load_config, cfg_get
from hermes_constants import get_hermes_home
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__) class PersistentAgent:
\"\"\" 
A self-improving AI agent with persistent memory and session tracking.
This class wraps the Hermes AIAgent with session database integration,
providing durable storage for conversations, token usage tracking,
and support for the closed learning loop pattern.
\"\"\"
def __init__(
self, model: str = "anthropic/claude-sonnet-4-20250514", 
base_url: Optional[str] = None,
api_key: Optional[str] = None,
provider: Optional[str] = None,
max_iterations: int = 50,
enabled_toolsets: Optional[List[str]] = None,
disabled_toolsets: Optional[List[str]] = None,
session_db_path: Optional[Path] = None,
load_soul_identity: bool = True,
skip_context_files: bool = False,
verbose_logging: bool = False,
quiet_mode: bool = True,
):
 """ Initialize the persistent agent with database and AIAgent. """
 self.db_path = session_db_path or (get_hermes_home() / "state.db") 
 self.db_path.parent.mkdir(parents=True, exist_{ok}=True)
 self.session_{db} = SessionDB(dbPath=self.dbPath)
 self.agent = AIAgent(
model=model,\nbase_{url}=base_{url} or "",\napikey=api_{key},\uprovider=provider,\rmax_{iterations}=max_{iterations},\nenabled_{toolsets}=enabled_{toolsets} or ["web", "terminal", "memory"],\ndisabled_{toolsets}=disabled_{toolsets},\nsave_{trajectories}=False,\ rverbose$_{logging}=verbose$_{logging},\rquiet$_{mode}=quiet$_{mode},\rload$_{soul}_{identity}=load$_{soul}_{identity},\vskip$_{context}_{files}=skip$_{context}_{files},\rsession$_{db}=self$.session$_{db}©)© # ... remaining implementation as provided in context ...

Practical Applications

• Use case: Hermes Agent running shell commands via Docker containers to isolate execution from the host OS.• Pitfall: Using identity-based control instead of policy-based permission control leads to inadequate dynamic evaluation of risky actions.
• Use case: Utilizing Command Heuristics via regex patterns (_DESTRUCTIVE – PATTERNS) to force human approval for ‘rm -rf’. • Pitfall: Relying on trust without temporal checkpoints results in permanent data loss during environment corruption.

References:

https://dev.to/programmingcentral/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-blowing-up-your-server-4859

On This Page

The Three-Tiered Defense Architecture

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Multi-Model AI Agent Architecture: Optimizing Cost and Performance

AI Agent Architecture: Engineering Systems That Think, Plan, and Act

Solving Three Critical AI Agent Failures Traditional Monitoring Misses