Optimizing AI Coding Agents: A Case Study in 65% Token Reduction
These articles are AI-generated summaries. Please check the original sources for full details.
How I Cut My AI Coding Agent’s Token Usage by 65% (Without Changing Models)
Nicola Alessi successfully reduced Claude Code input tokens from 8,200 to 2,100 on a 200-file TypeScript project. The optimization focused on replacing broad grep searches with precise AST-level dependency mapping to eliminate redundant file reads.
Why This Matters
Technical debt in AI context management results in agents spending up to 80% of their token budget on ‘orientation’ rather than actual coding tasks. By failing to provide structural context, developers face 2-3x higher costs and slower session starts as agents rediscover the same logic every session, hitting usage caps prematurely.
Key Insights
- Specific documentation in CLAUDE.md focusing on ‘decisions, not descriptions’ yields a 20% token reduction (Nicola Alessi, 2026).
- Replacing grep-based searches with AST-level subgraphs reduced relevant file reads from 40 down to 5 (Nicola Alessi, 2026).
- Passive observation of tool calls and code changes effectively solves the ‘amnesia’ problem where agents forget discoveries between sessions.
- Local dependency mapping using tools like vexp (Rust-based) ensures zero-network overhead and data privacy while maintaining context.
- Stale observation tracking ensures that as code evolves, linked knowledge is automatically invalidated to prevent feeding the agent outdated context.
Working Examples
Example of a high-signal CLAUDE.md focusing on specific architectural decisions.
## Auth
- Auth uses middleware in src/auth/middleware.ts
- JWT tokens, not sessions. Refresh token rotation in src/auth/refresh.ts
- DO NOT touch src/auth/legacy.ts — deprecated, will be removed Q2
## Database
- Prisma ORM, schema in prisma/schema.prisma
- All migrations must be backward-compatible
Installation command for the vexp CLI to enable dependency graph mapping for agents.
npm install -g vexp-cli
Practical Applications
- Implementing specific architectural constraints in CLAUDE.md for TypeScript/Express projects to guide Claude Code. Pitfall: Using vague descriptions like ‘follow best practices’ which forces the agent to read the whole codebase to define ‘best.’
- Integrating the Model Context Protocol (MCP) with tools like Cursor or Windsurf to provide AST graphs. Pitfall: Letting agents grep ‘auth’ across the codebase, resulting in 40+ hits and 8,000 wasted tokens.
- Using passive memory tools to link observations to a code graph. Pitfall: Asking the agent to manually save notes, which has zero value to the current context window and results in low compliance.
References:
Continue reading
Next article
Benchmark: AVIF Achieves 91% Compression in WordPress Image Optimization Test
Related Content
Solving AI Agent Amnesia with MCP-Based Persistent Memory
AI coding agents suffer from session amnesia that leads to repetitive architectural errors; using a persistent MCP knowledge graph provides a reusable memory layer.
Engineering Safe AI Agents: Why the First Paid Call Must Be Boring
Reduce AI agent risk by implementing five boring constraints—routes, budget owners, credential rails, denied neighbors, and receipts—before scaling spend.
How AI Agents Reduced Issue Close Time from 67 Days to Under 2
Production data from a year of work reveals AI agents cut bug ratios in half and slashed issue resolution time from 67 days to under 2.