Skip to main content

On This Page

Optimizing AI Coding Agents: A Case Study in 65% Token Reduction

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How I Cut My AI Coding Agent’s Token Usage by 65% (Without Changing Models)

Nicola Alessi successfully reduced Claude Code input tokens from 8,200 to 2,100 on a 200-file TypeScript project. The optimization focused on replacing broad grep searches with precise AST-level dependency mapping to eliminate redundant file reads.

Why This Matters

Technical debt in AI context management results in agents spending up to 80% of their token budget on ‘orientation’ rather than actual coding tasks. By failing to provide structural context, developers face 2-3x higher costs and slower session starts as agents rediscover the same logic every session, hitting usage caps prematurely.

Key Insights

  • Specific documentation in CLAUDE.md focusing on ‘decisions, not descriptions’ yields a 20% token reduction (Nicola Alessi, 2026).
  • Replacing grep-based searches with AST-level subgraphs reduced relevant file reads from 40 down to 5 (Nicola Alessi, 2026).
  • Passive observation of tool calls and code changes effectively solves the ‘amnesia’ problem where agents forget discoveries between sessions.
  • Local dependency mapping using tools like vexp (Rust-based) ensures zero-network overhead and data privacy while maintaining context.
  • Stale observation tracking ensures that as code evolves, linked knowledge is automatically invalidated to prevent feeding the agent outdated context.

Working Examples

Example of a high-signal CLAUDE.md focusing on specific architectural decisions.

## Auth
- Auth uses middleware in src/auth/middleware.ts
- JWT tokens, not sessions. Refresh token rotation in src/auth/refresh.ts
- DO NOT touch src/auth/legacy.ts — deprecated, will be removed Q2
## Database
- Prisma ORM, schema in prisma/schema.prisma
- All migrations must be backward-compatible

Installation command for the vexp CLI to enable dependency graph mapping for agents.

npm install -g vexp-cli

Practical Applications

  • Implementing specific architectural constraints in CLAUDE.md for TypeScript/Express projects to guide Claude Code. Pitfall: Using vague descriptions like ‘follow best practices’ which forces the agent to read the whole codebase to define ‘best.’
  • Integrating the Model Context Protocol (MCP) with tools like Cursor or Windsurf to provide AST graphs. Pitfall: Letting agents grep ‘auth’ across the codebase, resulting in 40+ hits and 8,000 wasted tokens.
  • Using passive memory tools to link observations to a code graph. Pitfall: Asking the agent to manually save notes, which has zero value to the current context window and results in low compliance.

References:

Continue reading

Next article

Benchmark: AVIF Achieves 91% Compression in WordPress Image Optimization Test

Related Content