MCP vs. CLI: Measuring Token Overhead in Agent Search

I measured MCP vs a CLI for agent search

Engineer Ary Rabelo compared the token costs of SerpApi’s official MCP server against a custom MIT-licensed CLI. The results showed the MCP returning 6,047 tokens per call compared to just 351 for the CLI using field projection.

Why This Matters

The technical reality is that while MCP provides a standardized transport, it introduces significant ‘standing costs’ by injecting tool schemas into the context on every turn. For stateless operations like search, this overhead compounds as more tools are added, potentially wasting thousands of tokens before an agent even performs work. This contrasts with the ideal model of lean context usage where only necessary data is passed to the LLM.

Key Insights

Standing cost disparity: MCP injects 771 tokens per turn for tool schema, whereas a binary on PATH (CLI) incurs ~0 standing tokens (Rabelo, 2026).
Field Projection over Full Payloads: Using --fields title,link reduces response size to 351 tokens versus 6,047 for default MCP output.
Context Reduction via Code Execution: Anthropic’s research showed a Drive-to-Salesforce workflow token reduction from 150,000 to 2,000 by calling tools as code rather than loading definitions.

Practical Applications

Stateless Search: Use a CLI with minified JSON output and field projection to minimize context bloat in coding loops.
Governed Connections: Use MCP when requiring OAuth, multi-user auth, or server-side rate limiting across multiple clients.
Tool Proliferation Pitfall: Loading multiple MCP servers simultaneously creates additive standing costs that can consume several thousand tokens of the context window.

References:

https://dev.to/ary_rabelo_7fce97b75d6dbd/i-measured-mcp-vs-a-cli-for-agent-search-the-mcp-used-17x-more-tokens-per-call-43p6

On This Page

I measured MCP vs a CLI for agent search

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Event-Driven Architecture: Why It's Not About Speed and When to Actually Use It

Multi-Model AI Agent Architecture: Optimizing Cost and Performance

Scaling AI Agents with Model Context Protocol: A Production REX for 87 Connected Tools