Optimizing AI Expenditures with llm-spend: A Python Profiler for LLM Costs
These articles are AI-generated summaries. Please check the original sources for full details.
I Built a Profiler for My LLM Bill (and It Saved Me $30/month)
Developer Lakshmi Sravya Vedantham created llm-spend, a Python-based profiling tool designed to provide visibility into hidden AI API expenses. The tool revealed that a single summarization feature accounted for nearly 88% of a $47 OpenAI bill.
Why This Matters
Unlike traditional system resources like CPU or memory which utilize tools like htop or psutil, LLM costs remain invisible until the monthly invoice arrives. This lack of observability leads to a technical blind spot where inefficient prompts or high output token counts result in significant financial overhead without clear attribution to specific code functions.
Key Insights
- Output tokens are the primary cost driver, often priced 3-5x higher than input tokens across models like GPT-4o ($2.50 vs $10.00 per 1M) and Claude Sonnet ($3.00 vs $15.00 per 1M).
- Major LLM SDKs from OpenAI, Anthropic, and Google Gemini provide standardized usage fields in response objects, enabling cost tracking via simple attribute inspection.
- The Python inspect.stack function allows the profiler to programmatically attribute costs to the exact source file and function name that triggered the request.
- Local SQLite databases offer a zero-config, persistent storage solution for developer tools without the overhead of remote infrastructure.
- Summarization tasks are disproportionately expensive compared to classification due to higher output token volume.
- llm-spend provides terminal-based reporting to breakdown costs by file, model, or function label.
Working Examples
A decorator-based approach to automatically log token usage and costs to a local SQLite database.
from llm_spend import track\n@track(model="gpt-4o", label="summarize")\ndef summarize_article(text: str):\n response = openai_client.chat.completions.create(\n model="gpt-4o",\n messages=[{"role": "user", "content": text}],\n )\n return response
A context manager for manual token tracking, useful for streaming responses or custom SDKs.
from llm_spend import spending\nwith spending("claude-sonnet-4", label="classify") as s:\n response = client.messages.create(...)\n s.input_tokens = response.usage.input_tokens\n s.output_tokens = response.usage.output_tokens
Practical Applications
- Use case: High-granularity cost attribution for Python-based AI agents. Pitfall: Relying on provider dashboards results in a lack of feature-level spend visibility.
- Use case: Benchmarking model efficiency by comparing costs across Gemini, GPT, and Claude models. Pitfall: Ignoring output-to-input price ratios leads to unexpected budget exhaustion in text-generation tasks.
References:
- https://dev.to/lakshmisravyavedantham/i-built-a-profiler-for-my-llm-bill-and-it-saved-me-30month-4mob
- github.com/LakshmiSravyaVedantham/llm-spend
Continue reading
Next article
Why AI Detection Tools Fail: Vibe-Check Scores 0/100 on AI-Generated Codebase
Related Content
Optimizing Coding Agent Performance: Reducing Context Bloat by 22–45%
John Miller achieved a 22–45% reduction in coding agent context usage by eliminating context bloat, improving AI development efficiency.
Building Autonomous AI Agents with the GitHub Copilot Agentic Coding SDK
Integrate the GitHub Copilot SDK into Python apps to build agents capable of autonomous tool execution, file access, and multi-turn memory.
CommitAI: Building a Local Offline Git Assistant with Gemma 4 and Ollama
CommitAI automates Git workflows offline using Gemma 4 on hardware as limited as an 8GB RAM MacBook Air M2.