Meta AI Releases SAM Audio: A Unified Model for Intuitive Audio Separation

Meta AI has released SAM Audio, a new prompt-driven audio separation model designed to streamline audio editing workflows and eliminate the need for custom models for each sound class. The model comes in three sizes – sam-audio-small, sam-audio-base, and sam-audio-large – and is available for download and testing in the Segment Anything Playground.

SAM Audio aims to bridge the gap between ideal audio separation and the complexities of real-world recordings, where isolating specific sounds is often a manual and time-consuming process. Current audio editing often requires specialized tools or extensive manual work, costing significant time and resources for content creators and audio engineers.

Key Insights

Diffusion Transformer Architecture: SAM Audio utilizes a diffusion transformer, enabling self and cross-attention over time-aligned features for improved separation quality.
Multimodal Prompting: The model supports text, visual (object selection in video), and span (time segment marking) prompting, offering flexible control over separation.
Target/Residual Output: SAM Audio outputs both a target waveform (isolated sound) and a residual waveform (everything else), directly supporting common editing operations.

Working Example

# Example using SAMAudioProcessor (conceptual, based on context)
from sam_audio import SAMAudioProcessor

processor = SAMAudioProcessor(model_name="sam-audio-base")
mixture_audio, sample_rate = load_audio("audio_with_multiple_sounds.wav")
prompt = "dog barking"
result = processor.separate(mixture_audio, prompt)

target_audio = result.target
residual_audio = result.residual

# Now you can use target_audio (isolated dog bark) or residual_audio (everything else)

Practical Applications

Podcast Editing: Automatically remove unwanted sounds (e.g., coughs, background noise) from podcast recordings.
Music Production: Isolate instrument tracks (e.g., guitar, vocals) from a mixed audio file for remixing or mastering.

References:

https://www.marktechpost.com/2025/12/17/meta-ai-releases-sam-audio-a-state-of-the-art-unified-model-that-uses-intuitive-and-multimodal-prompts-for-audio-separation/

On This Page

Meta AI Releases SAM Audio: A Unified Model for Intuitive Audio Separation