Optimizing AI Expenditures with llm-spend: A Python Profiler for LLM Costs

I Built a Profiler for My LLM Bill (and It Saved Me $30/month)

Developer Lakshmi Sravya Vedantham created llm-spend, a Python-based profiling tool designed to provide visibility into hidden AI API expenses. The tool revealed that a single summarization feature accounted for nearly 88% of a $47 OpenAI bill.

Why This Matters

Unlike traditional system resources like CPU or memory which utilize tools like htop or psutil, LLM costs remain invisible until the monthly invoice arrives. This lack of observability leads to a technical blind spot where inefficient prompts or high output token counts result in significant financial overhead without clear attribution to specific code functions.

Key Insights

Output tokens are the primary cost driver, often priced 3-5x higher than input tokens across models like GPT-4o ($2.50 vs $10.00 per 1M) and Claude Sonnet ($3.00 vs $15.00 per 1M).
Major LLM SDKs from OpenAI, Anthropic, and Google Gemini provide standardized usage fields in response objects, enabling cost tracking via simple attribute inspection.
The Python inspect.stack function allows the profiler to programmatically attribute costs to the exact source file and function name that triggered the request.
Local SQLite databases offer a zero-config, persistent storage solution for developer tools without the overhead of remote infrastructure.
Summarization tasks are disproportionately expensive compared to classification due to higher output token volume.
llm-spend provides terminal-based reporting to breakdown costs by file, model, or function label.

Working Examples

A decorator-based approach to automatically log token usage and costs to a local SQLite database.

from llm_spend import track\n@track(model="gpt-4o", label="summarize")\ndef summarize_article(text: str):\n    response = openai_client.chat.completions.create(\n        model="gpt-4o",\n        messages=[{"role": "user", "content": text}],\n    )\n    return response

A context manager for manual token tracking, useful for streaming responses or custom SDKs.

from llm_spend import spending\nwith spending("claude-sonnet-4", label="classify") as s:\n    response = client.messages.create(...)\n    s.input_tokens = response.usage.input_tokens\n    s.output_tokens = response.usage.output_tokens

Practical Applications

Use case: High-granularity cost attribution for Python-based AI agents. Pitfall: Relying on provider dashboards results in a lack of feature-level spend visibility.
Use case: Benchmarking model efficiency by comparing costs across Gemini, GPT, and Claude models. Pitfall: Ignoring output-to-input price ratios leads to unexpected budget exhaustion in text-generation tasks.

References:

https://dev.to/lakshmisravyavedantham/i-built-a-profiler-for-my-llm-bill-and-it-saved-me-30month-4mob
github.com/LakshmiSravyaVedantham/llm-spend

On This Page

I Built a Profiler for My LLM Bill (and It Saved Me $30/month)

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

How One Developer Cut AI Agent Token Waste by 20K Per Query With a Simple Skill Pattern

Optimizing RAG at Scale: Chunking Strategies, Hybrid Retrieval & Bayesian Search

Building Autonomous AI Agents with the GitHub Copilot Agentic Coding SDK