Open Responses: A New Standard for AI Agent Inference
These articles are AI-generated summaries. Please check the original sources for full details.
Open Responses: What you need to know
Open Responses is a new, open inference standard built by the open source AI community, backed by the Hugging Face ecosystem, and initiated by OpenAI. It’s based on the Responses API and designed for the evolving needs of AI Agents, overcoming the limitations of the older Chat Completion format.
The shift to autonomous agents demands more than turn-based conversation support; yet, the Chat Completion format remains prevalent despite its inadequacy for complex, agentic tasks. This introduces inefficiencies and necessitates a standardized approach for advanced AI systems.
Why This Matters
Currently, developers often rely on workarounds and undocumented extensions to the legacy Completions API, leading to inconsistent and fragile inference experiences. Replacing these with a formalized, open standard reduces integration costs and improves code maintainability – especially given that agentic workflows are rapidly increasing in complexity and scale.
Key Insights
- Responses API launched by OpenAI, 2025: Marked the initial effort towards addressing limitations of previous APIs.
- Stateless design with optional encryption: Allows flexibility for providers, accommodating both performance and privacy requirements.
- Semantic streaming: Provides granular updates during inference, improving responsiveness and observability.
Working Example
curl https://evalstate-openresponses.hf.space/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HF_TOKEN" \
-H "OpenResponses-Version: latest" \
-N \
-d '{
"model": "moonshotai/Kimi-K2-Thinking:nebius",
"input": "explain the theory of life"
}'
Practical Applications
- Hugging Face Inference Providers: Implementing Open Responses to offer a standardized interface for accessing various models.
- Agent Development: Facilitating the creation of robust and scalable agents with enhanced reasoning and tool usage capabilities.
References:
Continue reading
Next article
Palo Alto Fixes GlobalProtect DoS Flaw That Can Crash Firewalls Without Login
Related Content
Intel DeepMath Improves LLM Math Reasoning with Python Executors
Intel’s DeepMath agent, built on Qwen3-Thinking, reduces LLM output length by up to 66% and improves accuracy on math problems by using Python code execution.
LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI
LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.
Beyond Feature Delivery: How Open Source Redefines Software Engineering Mindsets
Open source contributor Tarunya Kesharwani details how GSoC participation and PR reviews shift engineering focus from basic feature completion to long-term maintainability, highlighting that professional software engineering requires balancing immediate functionality with architectural scalability and collaborative code standards across diverse technology stacks.