Inference

2 articles in this category

AI NewsAgentsInference

Open Responses: A New Standard for AI Agent Inference

Open Responses, initiated by OpenAI and built by the open source community, aims to address the limitations of the Chat Completion format for agentic workloads.

Jan 15, 2026

AI NewsLLMsInference

Continuous batching from first principles

Continuous batching maximizes LLM throughput by intelligently combining prefill and decode phases, achieving up to a 2x speedup in token generation.

Sep 11, 2025