Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

Huggingface’s Open ASR Leaderboard now tracks 27K models, with new multilingual and long-form transcription benchmarks. As of Nov 21, 2025, the leaderboard highlights Conformer + LLM decoders as top performers, but closed-source systems still lead in long-form tasks.

Why This Matters

Most ASR benchmarks focus on short-form English transcription (<30s), ignoring critical real-world needs like multilingual support and long-form efficiency. Current models struggle to balance accuracy, speed, and language coverage, leading to suboptimal performance in applications like global call centers or podcast transcription. For example, while Conformer + LLM decoders achieve state-of-the-art accuracy, they lag in throughput compared to CTC/TDT decoders, which sacrifice some precision for speed.

Key Insights

“Conformer + LLM decoders lead in English accuracy (NVIDIA, IBM, Microsoft), 2025”
“CTC/TDT decoders offer 10–100× faster throughput but higher error rates”
“Whisper Large v3 supports 99 languages but fine-tuned variants outperform in English”

Practical Applications

Use Case: “OpenAI’s Whisper Large v3 for multilingual transcription in global call centers”
Pitfall: “Over-reliance on English-only models like Parakeet CTC 1.1B may exclude non-English speakers”

References:

On This Page

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks