NadirClaw: Building Cost-Aware LLM Routing with Local Prompt Classification

How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching

NadirClaw implements an intelligent routing layer that classifies prompts locally before sending them to the most suitable model tier. By utilizing centroid vectors and a local encoder, the system avoids unnecessary high-cost model calls for simple tasks. In live tests, this configuration demonstrated significant cost savings compared to an always-Pro model baseline.

Why This Matters

The technical reality of deploying LLMs involves a constant trade-off between reasoning capability and operational cost. Many production systems default to high-parameter models for every request, which leads to significant financial waste on low-complexity tasks like basic formatting or simple arithmetic. NadirClaw addresses this by introducing a local classification step that ensures only high-complexity reasoning tasks consume expensive ‘Pro’ tier tokens.

By moving the routing decision to a local proxy, developers can maintain a single ‘auto’ model endpoint while benefiting from the speed of lightweight models and the depth of larger ones. This architectural pattern is essential for scaling agentic systems where thousands of intermediate steps may not require full reasoning capabilities, thereby optimizing both latency and budget.

Key Insights

Local classification via NadirClaw CLI uses JSON output to return routing tier, score, and confidence without making live LLM calls.
The system utilizes the all-MiniLM-L6-v2 encoder from Sentence-Transformers to generate embeddings for local similarity checks.
Routing decisions are determined by comparing prompt embeddings against simple_centroid.npy and complex_centroid.npy vectors.
A default confidence threshold of 0.06 is applied; prompts falling below this threshold are automatically escalated to the complex tier.
NadirClaw supports modifier-marker scans, identifying ‘agentic’, ‘reasoning’, or ‘vision’ requests based on text markers or request shape.
Live routing through a local proxy server allows for OpenAI-compatible requests to be dynamically mapped to models like gemini-2.5-flash and gemini-2.5-pro.

Working Examples

Function to locally classify prompts into tiers using the NadirClaw CLI.

import subprocess, json
def classify(prompt: str) -> dict:
    r = subprocess.run(
        ["nadirclaw", "classify", "--format", "json", prompt],
        capture_output=True, text=True, timeout=180,
    )
    if r.returncode != 0:
        return {"prompt": prompt, "error": (r.stderr or r.stdout).strip()}
    return json.loads(r.stdout.strip())

prompts = ["What is 2+2?", "Refactor the auth module to use dependency injection"]
results = [classify(p) for p in prompts]

Starting the NadirClaw proxy server to handle live model routing.

import os, subprocess
PORT = 8856
env = os.environ.copy()
env.update({
    "GEMINI_API_KEY": "YOUR_KEY_HERE",
    "NADIRCLAW_SIMPLE_MODEL": "gemini-2.5-flash",
    "NADIRCLAW_COMPLEX_MODEL": "gemini-2.5-pro",
    "NADIRCLAW_PORT": str(PORT),
})
server_proc = subprocess.Popen(
    ["nadirclaw", "serve", "--verbose"],
    env=env
)

Practical Applications

Enterprise Chatbots: Routing basic FAQs to Gemini Flash while reserving Gemini Pro for complex architectural or legal inquiries. Pitfall: Using an overly high confidence threshold may cause complex edge cases to fail on simple models.
Coding Assistants: Detecting ‘agentic’ markers in prompts to ensure code execution tasks are always routed to high-reasoning models. Pitfall: Incorrectly configured environment variables can lead to proxy startup failures, defaulting to a single model.
Cost Monitoring: Utilizing ‘nadirclaw report’ to analyze JSONL request logs and estimate savings against a fixed-model baseline. Pitfall: Ignoring the request shape (e.g., vision/tools) might bypass the intended routing logic.

References:

https://www.marktechpost.com/2026/05/10/how-to-build-a-cost-aware-llm-routing-system-with-nadirclaw-using-local-prompt-classification-and-gemini-model-switching/

On This Page

How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Google DeepMind Unveils Gemini-Powered AI Mouse Pointer for Context-Aware Computing

Why Your AGENTS.md Files are Sabotaging AI Coding Performance

Google AI Releases Android Bench: Specialized Evaluation for Mobile LLMs