Skip to main content

On This Page

Building Hierarchical AI Agents with Qwen2.5 and Python Tool Execution

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

A Coding Implementation to Build a Hierarchical Planner AI Agent Using Open-Source LLMs with Tool Execution and Structured Multi-Agent Reasoning

Michal Sutter demonstrates a structured multi-agent architecture utilizing the Qwen2.5-1.5B-Instruct model for complex task decomposition. The system employs a specialized planner agent to break down goals into 3-8 discrete, executable steps.

Why This Matters

While monolithic LLM calls often struggle with complex reasoning and long-tail logic, hierarchical architectures distribute cognitive load across specialized roles. Using a 1.5B parameter model in 4-bit quantization allows for efficient local execution while maintaining the structured JSON output necessary for autonomous tool use and iterative reasoning.

Key Insights

  • Fact: The system utilizes 4-bit quantization to run the Qwen2.5-1.5B-Instruct model efficiently on standard GPU hardware as of 2026.
  • Concept: Hierarchical planning decomposes high-level goals into 3-8 independent steps categorized by tools like ‘llm’ or ‘python’.
  • Tool: The Python execution environment uses io.StringIO and contextlib.redirect_stdout to safely capture output from dynamically generated agent code.

Working Examples

Loading the Qwen2.5 model with 4-bit quantization for efficient agentic reasoning.

MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype="auto",
    load_in_4bit=True,
)

Robust JSON extraction logic to handle imperfect model outputs during the planning phase.

def extract_json_block(text: str) -> Optional[Any]:
    fenced = re.search(r"```json\s*(.*?)\s*```", text, flags=re.DOTALL | re.IGNORECASE)
    if fenced:
        cand = fenced.group(1).strip()
        try:
            return json.loads(cand)
        except:
            pass
    # ... fallback to scanning for braces

Practical Applications

  • Logistics Coordination: A multi-agent system where a planner decomposes tasks for routing and inventory agents. Pitfall: Failing to pass enough context between steps leads to execution silos.
  • Automated Data Analysis: Using the Python tool for dynamic simulations and calculations. Pitfall: Unconstrained code execution without safety wrappers can lead to environment crashes.

References:

Continue reading

Next article

Google DeepMind's Unified Latents (UL) Sets New SOTA for Video Generation with 1.3 FVD

Related Content