Skip to main content
← All Tags

Audio Language Model

9 articles in this category

AI NewsArtificial IntelligenceAudio Language Model

Inworld AI Realtime TTS-2: A Closed-Loop Voice Model for Context-Aware Conversations

Inworld AI launches Realtime TTS-2, a closed-loop voice model achieving sub-200ms latency and context-aware emotional delivery.

Read more
AI NewsAudio Language ModelOpen Source

smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Advanced Audio Models

Deep-unlearning team releases smol-audio, a repository for fine-tuning Whisper, Voxtral, and Audio Flamingo 3 using standard 16 GB Colab runtimes.

Read more
AI NewsAudio Language ModelNew Releases

Mistral AI Unveils Voxtral TTS: A 4B Parameter Open-Weight Model for 70ms Low-Latency Speech

Mistral AI releases Voxtral TTS, a 4B parameter open-weight model achieving 70ms latency and 9.7x real-time factor across 9 languages.

Read more
AI NewsAudio Language ModelNew Releases

IBM Granite 4.0 1B Speech: A High-Efficiency Multilingual Model for Edge AI

IBM's Granite 4.0 1B Speech model reduces parameter count by 50% while achieving a 5.52 Average WER, optimized for edge-style multilingual ASR and AST.

Read more
AI NewsAudio Language ModelArtificial Intelligence

Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents

Inworld AI’s TTS-1.5 achieves sub-250ms P90 latency for voice agents, significantly improving responsiveness.

Read more
AI NewsArtificial IntelligenceAudio Language Model

How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets

This tutorial demonstrates designing a fully streaming voice agent achieving low-latency responsiveness, with a focus on quantifiable metrics like time to first audio—potentially reaching under 1 second.

Read more
AI NewsAudio Language ModelArtificial Intelligence

Meta AI Releases SAM Audio: A Unified Model for Intuitive Audio Separation

Meta AI’s SAM Audio achieves state-of-the-art performance in audio separation, scoring up to 4.49 in subjective evaluations for professional instrument isolation.

Read more
AI NewsAudio Language ModelLanguage Model

StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling

StepFun AI’s Step-Audio-R1 achieves 83.6% accuracy on audio benchmarks by addressing training limitations, not audio modality flaws.

Read more
AI NewsAudio Language ModelOpen Source

Maya1: A New Open Source 3B Voice Model For Expressive Text To Speech On A Single GPU

Maya1, a 3B parameter open-source TTS model, enables expressive speech generation on a single GPU.

Read more