Optimizing AI Coding Agents: A Case Study in 65% Token Reduction

How I Cut My AI Coding Agent’s Token Usage by 65% (Without Changing Models)

Nicola Alessi successfully reduced Claude Code input tokens from 8,200 to 2,100 on a 200-file TypeScript project. The optimization focused on replacing broad grep searches with precise AST-level dependency mapping to eliminate redundant file reads.

Why This Matters

Technical debt in AI context management results in agents spending up to 80% of their token budget on ‘orientation’ rather than actual coding tasks. By failing to provide structural context, developers face 2-3x higher costs and slower session starts as agents rediscover the same logic every session, hitting usage caps prematurely.

Key Insights

Specific documentation in CLAUDE.md focusing on ‘decisions, not descriptions’ yields a 20% token reduction (Nicola Alessi, 2026).
Replacing grep-based searches with AST-level subgraphs reduced relevant file reads from 40 down to 5 (Nicola Alessi, 2026).
Passive observation of tool calls and code changes effectively solves the ‘amnesia’ problem where agents forget discoveries between sessions.
Local dependency mapping using tools like vexp (Rust-based) ensures zero-network overhead and data privacy while maintaining context.
Stale observation tracking ensures that as code evolves, linked knowledge is automatically invalidated to prevent feeding the agent outdated context.

Working Examples

Example of a high-signal CLAUDE.md focusing on specific architectural decisions.

## Auth
- Auth uses middleware in src/auth/middleware.ts
- JWT tokens, not sessions. Refresh token rotation in src/auth/refresh.ts
- DO NOT touch src/auth/legacy.ts — deprecated, will be removed Q2
## Database
- Prisma ORM, schema in prisma/schema.prisma
- All migrations must be backward-compatible

Installation command for the vexp CLI to enable dependency graph mapping for agents.

npm install -g vexp-cli

Practical Applications

Implementing specific architectural constraints in CLAUDE.md for TypeScript/Express projects to guide Claude Code. Pitfall: Using vague descriptions like ‘follow best practices’ which forces the agent to read the whole codebase to define ‘best.’
Integrating the Model Context Protocol (MCP) with tools like Cursor or Windsurf to provide AST graphs. Pitfall: Letting agents grep ‘auth’ across the codebase, resulting in 40+ hits and 8,000 wasted tokens.
Using passive memory tools to link observations to a code graph. Pitfall: Asking the agent to manually save notes, which has zero value to the current context window and results in low compliance.

References:

https://dev.to/nicolalessi/how-i-cut-my-ai-coding-agents-token-usage-by-65-without-changing-models-47m

On This Page

How I Cut My AI Coding Agent’s Token Usage by 65% (Without Changing Models)

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Solving AI Agent Amnesia with MCP-Based Persistent Memory

Anthropic Quantifies Expertise Multiplier; Practitioners Build Agent-Side Control Plane

Loop Engineering Replaces Prompt Engineering: How Autonomous AI Loops Could 10x Your Coding Bill Without Guardrails