Skip to main content

On This Page

Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Google Health AI Releases MedASR: A Conformer-Based Medical Speech-to-Text Model

Google Health AI has released MedASR, an open-weights speech-to-text model based on the Conformer architecture, specifically designed for clinical dictation and physician-patient conversations. The model contains 105 million parameters and is positioned as a starting point for healthcare voice applications.

Why This Matters

General-purpose speech-to-text models often struggle with the nuanced vocabulary and phrasing common in clinical settings, leading to inaccurate transcriptions and hindering downstream AI workflows. This can increase the workload for medical professionals and potentially compromise patient care. MedASR addresses this gap with domain-specific training, aiming for significantly lower word error rates in medical contexts.

Key Insights

  • Conformer Architecture: Combines convolutional and self-attention layers for capturing both local and long-range dependencies in speech.
  • Training Data: Trained on approximately 5000 hours of de-identified medical speech data, covering radiology, internal medicine, and family medicine.
  • Performance Gains: MedASR achieves competitive or superior word error rates compared to models like Gemini 2.5 Pro and Whisper v3 Large, particularly when combined with a six-gram language model.

Working Example

from transformers import pipeline
import huggingface_hub
audio = huggingface_hub.hf_hub_download("google/medasr", "test_audio.wav")
pipe = pipeline("automatic-speech-recognition", model="google/medasr")
result = pipe(audio, chunk_length_s=20, stride_length_s=2)
print(result)

Practical Applications

  • Radiology Dictation: Integrating MedASR into radiology workflows for automated transcription of image interpretations.
  • Pitfall: Relying on greedy decoding alone; combining MedASR with a language model yields significant performance improvements.

References:

Continue reading

Next article

InstaDeep Introduces Nucleotide Transformer v3 (NTv3): A New Multi-Species Genomics Foundation Model

Related Content