Skip to main content

On This Page

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

1 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Huggingface’s Open ASR Leaderboard now tracks 27K models, with new multilingual and long-form transcription benchmarks. As of Nov 21, 2025, the leaderboard highlights Conformer + LLM decoders as top performers, but closed-source systems still lead in long-form tasks.

Why This Matters

Most ASR benchmarks focus on short-form English transcription (<30s), ignoring critical real-world needs like multilingual support and long-form efficiency. Current models struggle to balance accuracy, speed, and language coverage, leading to suboptimal performance in applications like global call centers or podcast transcription. For example, while Conformer + LLM decoders achieve state-of-the-art accuracy, they lag in throughput compared to CTC/TDT decoders, which sacrifice some precision for speed.

Key Insights

  • “Conformer + LLM decoders lead in English accuracy (NVIDIA, IBM, Microsoft), 2025”
  • “CTC/TDT decoders offer 10–100× faster throughput but higher error rates”
  • “Whisper Large v3 supports 99 languages but fine-tuned variants outperform in English”

Practical Applications

  • Use Case: “OpenAI’s Whisper Large v3 for multilingual transcription in global call centers”
  • Pitfall: “Over-reliance on English-only models like Parakeet CTC 1.1B may exclude non-English speakers”

References:



Continue reading

Next article

A Guide to Engine Test Kit in Junit 5

Related Content