Understanding LLM API Architecture: Request Patterns, Tokenization, and Cost Optimization

An LLM API call, in 4 GIFs

Jasmin Virdi introduces the ‘Building TinyAgent’ series to demystify raw LLM API calls in Node.js. The system reveals that LLM APIs are stateless, requiring the entire message history to be resent for every turn.

Why This Matters

Developers often rely on SDKs that abstract the raw request, leading to production bugs when ignoring the stop_reason or failing to log usage metrics. Because output tokens are significantly more expensive than input tokens and reasoning models bill internal ‘thinking’ as output, a lack of usage logging can lead to unexpected financial spikes—potentially $600/month for a single small feature making 100k calls daily.

Key Insights

The stop_reason field is critical for branching logic; ignoring it leads to bugs when responses are truncated by max_tokens or interrupted by tool_use (Virdi, 2026).
Tokenization does not follow word boundaries; for example, ‘Unbelievable’ is one word but four tokens (Virdi, 2026).
Non-English languages incur higher costs, with Japanese, Hindi, and Arabic typically running 2–4× the token count of English content (Virdi, 2026).
Pricing asymmetry exists between inputs and outputs; long prompts are cheap while long responses are roughly 5× more expensive (Virdi, 2026).

Practical Applications

Use case: Multi-turn chatbots. Behavior: Maintain a messages array and push every user prompt and model reply back into the next API call.
Pitfall: Bloated tool schemas. Consequence: These eat into the input budget on every single request since they are resent with each call.

References:

https://dev.to/jasmin/an-llm-api-call-in-4-gifs-33b1

On This Page

An LLM API call, in 4 GIFs

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Taming LLM Output Chaos: A 3-Tier Normalisation Pattern

Anthropic Quantifies Expertise Multiplier; Practitioners Build Agent-Side Control Plane

The Bottleneck Was Never Generation: Building Governed Agentic Systems