Skip to main content

On This Page

Inside OpenAI’s in-house data agent

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Inside OpenAI’s in-house data agent

OpenAI has developed a bespoke, internal AI data agent powered by GPT-5 and Codex, designed to explore and reason over its massive data platform, containing over 600 petabytes of data across 70,000 datasets. This agent dramatically reduces the time to insight for employees, moving it from days to minutes.

Why This Matters

Ideal data analysis assumes clean, well-documented data and analysts with deep contextual knowledge. In reality, data is often messy, poorly documented, and requires significant effort to understand relationships and potential pitfalls. At OpenAI’s scale—with 3.5k+ internal users and 70k datasets—the cost of inefficient data access and analysis quickly becomes substantial, hindering data-driven decision-making.

Key Insights

  • 600 petabytes: The total volume of data managed by OpenAI’s data platform.
  • Context is King: The agent relies on multiple layers of context – metadata, query inference, curated descriptions, code-level definitions, Slack/Google Docs integration, and a learning memory system – to ensure accurate results.
  • Evals API for Quality Control: OpenAI uses its Evals API to systematically evaluate the agent’s performance with curated question-answer pairs and automated SQL comparison, preventing regressions and ensuring reliability.

Practical Applications

  • OpenAI Internal Teams: Engineering, Data Science, Go-To-Market, Finance, and Research teams use the agent for high-impact data questions, such as evaluating product launches and understanding business health.
  • Pitfall: Overly prescriptive prompting can hinder the agent’s ability to reason effectively; allowing GPT-5 to choose the execution path leads to more robust results.

References:

Continue reading

Next article

Introducing NVIDIA Cosmos Policy for Advanced Robot Control

Related Content