MCP vs. CLI: Measuring Token Overhead in Agent Search
These articles are AI-generated summaries. Please check the original sources for full details.
I measured MCP vs a CLI for agent search
Engineer Ary Rabelo compared the token costs of SerpApi’s official MCP server against a custom MIT-licensed CLI. The results showed the MCP returning 6,047 tokens per call compared to just 351 for the CLI using field projection.
Why This Matters
The technical reality is that while MCP provides a standardized transport, it introduces significant ‘standing costs’ by injecting tool schemas into the context on every turn. For stateless operations like search, this overhead compounds as more tools are added, potentially wasting thousands of tokens before an agent even performs work. This contrasts with the ideal model of lean context usage where only necessary data is passed to the LLM.
Key Insights
- Standing cost disparity: MCP injects 771 tokens per turn for tool schema, whereas a binary on PATH (CLI) incurs ~0 standing tokens (Rabelo, 2026).
- Field Projection over Full Payloads: Using
--fields title,linkreduces response size to 351 tokens versus 6,047 for default MCP output. - Context Reduction via Code Execution: Anthropic’s research showed a Drive-to-Salesforce workflow token reduction from 150,000 to 2,000 by calling tools as code rather than loading definitions.
Practical Applications
- Stateless Search: Use a CLI with minified JSON output and field projection to minimize context bloat in coding loops.
- Governed Connections: Use MCP when requiring OAuth, multi-user auth, or server-side rate limiting across multiple clients.
- Tool Proliferation Pitfall: Loading multiple MCP servers simultaneously creates additive standing costs that can consume several thousand tokens of the context window.
References:
Continue reading
Next article
Kafka 4.0+: Mastering KRaft, Incremental Rebalancing, and Production Python Patterns
Related Content
Solving AI Agent Ambiguity with Domain-Driven Design's Ubiquitous Language
AI coding agents amplify vocabulary ambiguity, leading to semantic mismatches that can result in critical production incidents.
Custom Evals: A Unified Evaluation Framework for 17+ LLM Agent Frameworks
Custom Evals provides a lightweight, backend-free evaluation layer supporting 17+ agent frameworks with a four-layer metric system.
The Six Levels of MCP Server Maturity: Moving Beyond API Wrapping
Most production MCP servers are stuck at Level 1 or 2, failing to provide the domain context necessary for effective agent reasoning.