How Tolan builds voice-first AI with GPT-5.1
These articles are AI-generated summaries. Please check the original sources for full details.
How Tolan builds voice-first AI with GPT-5.1
Tolan is a voice-first AI companion utilizing GPT-5.1 to deliver personalized, ongoing conversations with users. The application, built by Portola, has already amassed over 200,000 monthly active users since its launch in February 2025.
Voice AI presents unique challenges compared to text-based models, demanding low latency and robust context management to maintain natural, flowing interactions. Traditional approaches to context caching often fail in dynamic voice conversations, leading to disjointed experiences and user frustration, potentially impacting retention rates.
Key Insights
- 0.7-second latency reduction: Implementing OpenAI’s GPT-5.1 and Responses API decreased speech initiation time by 0.7 seconds.
- Context Reconstruction: Tolan rebuilds its context window each turn, incorporating summaries, persona cards, memories, and real-time signals.
- Turbopuffer: Tolan uses Turbopuffer, a high-speed vector database, for sub-50ms memory lookup times.
Practical Applications
- Personalized Companions: Tolan provides a continuously learning AI companion, improving user engagement through consistent personality and memory.
- Pitfall: Relying on cached prompts in voice applications leads to inconsistencies and a disjointed user experience when the conversation topic shifts.
References:
Continue reading
Next article
Introducing ChatGPT Health
Related Content
OpenAI Launches GPT-Realtime-2 and Specialized Audio Models in General Availability
OpenAI moves the Realtime API to general availability, introducing GPT-Realtime-2 with GPT-5-class reasoning and a 128K context window.
Supertonic v3: On-Device TTS with 31-Language Support and Expressive Tags
Supertone releases Supertonic v3, an on-device TTS model supporting 31 languages and expressive tags with a compact 404 MB disk footprint.
Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x
Salesforce AI Research released VoiceAgentRAG, an open-source architecture that reduces retrieval latency by 316x using a dual-agent system to meet the 200ms voice response budget.