Meta AI Releases SAM Audio: A Unified Model for Intuitive Audio Separation
These articles are AI-generated summaries. Please check the original sources for full details.
Meta AI Releases SAM Audio: A Unified Model for Intuitive Audio Separation
Meta AI has released SAM Audio, a new prompt-driven audio separation model designed to streamline audio editing workflows and eliminate the need for custom models for each sound class. The model comes in three sizes – sam-audio-small, sam-audio-base, and sam-audio-large – and is available for download and testing in the Segment Anything Playground.
SAM Audio aims to bridge the gap between ideal audio separation and the complexities of real-world recordings, where isolating specific sounds is often a manual and time-consuming process. Current audio editing often requires specialized tools or extensive manual work, costing significant time and resources for content creators and audio engineers.
Key Insights
- Diffusion Transformer Architecture: SAM Audio utilizes a diffusion transformer, enabling self and cross-attention over time-aligned features for improved separation quality.
- Multimodal Prompting: The model supports text, visual (object selection in video), and span (time segment marking) prompting, offering flexible control over separation.
- Target/Residual Output: SAM Audio outputs both a
targetwaveform (isolated sound) and aresidualwaveform (everything else), directly supporting common editing operations.
Working Example
# Example using SAMAudioProcessor (conceptual, based on context)
from sam_audio import SAMAudioProcessor
processor = SAMAudioProcessor(model_name="sam-audio-base")
mixture_audio, sample_rate = load_audio("audio_with_multiple_sounds.wav")
prompt = "dog barking"
result = processor.separate(mixture_audio, prompt)
target_audio = result.target
residual_audio = result.residual
# Now you can use target_audio (isolated dog bark) or residual_audio (everything else)
Practical Applications
- Podcast Editing: Automatically remove unwanted sounds (e.g., coughs, background noise) from podcast recordings.
- Music Production: Isolate instrument tracks (e.g., guitar, vocals) from a mixed audio file for remixing or mastering.
References:
Continue reading
Next article
ForumTroll Phishing Campaign Targets Russian Scholars with eLibrary Lures
Related Content
Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents
Inworld AI’s TTS-1.5 achieves sub-250ms P90 latency for voice agents, significantly improving responsiveness.
Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos
Meta AI’s SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, outperforming existing models in promptable concept segmentation.
StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling
StepFun AI’s Step-Audio-R1 achieves 83.6% accuracy on audio benchmarks by addressing training limitations, not audio modality flaws.