Skip to main content
← All Tags

Inference

2 articles in this category

AI NewsAgentsInference

Open Responses: A New Standard for AI Agent Inference

Open Responses, initiated by OpenAI and built by the open source community, aims to address the limitations of the Chat Completion format for agentic workloads.

Read more
AI NewsLLMsInference

Continuous batching from first principles

Continuous batching maximizes LLM throughput by intelligently combining prefill and decode phases, achieving up to a 2x speedup in token generation.

Read more