Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family
These articles are AI-generated summaries. Please check the original sources for full details.
Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family
Baidu introduced ERNIE-4.5-VL-28B-A3B-Thinking, a vision-language model that activates only 3B parameters per token while maintaining a 30B total parameter MoE architecture. It achieves competitive performance on document, chart, and video reasoning tasks compared to models with 7B–32B parameters.
Why This Matters
Traditional large multimodal models require massive parameter budgets, but ERNIE-4.5-VL-28B-A3B-Thinking uses a Mixture-of-Experts (MoE) design to activate only 3B parameters per token, reducing compute and memory overhead by 90% compared to full 30B activation. This enables deployment on resource-constrained systems without sacrificing reasoning capabilities on complex tasks like STEM problems or document analysis.
Key Insights
- “3B active parameters per token with 30B total parameters, 2025”: Baidu’s A3B routing scheme activates a subset of experts for each input.
- “Thinking with Images for document and chart reasoning”: The model iteratively zooms into image regions and integrates local observations into final answers.
- “Apache License 2.0 enables commercial deployment”: Open-sourcing under permissive terms supports enterprise adoption.
Practical Applications
- Use Case: Document analysis in analytics workflows using “Thinking with Images” for dense text and chart interpretation.
- Pitfall: Overlooking the need for mid-training on visual-language reasoning corpora, which risks poor semantic alignment between modalities.
References:
Continue reading
Next article
End-to-End Interactive Analytics Dashboard with PyGWalker
Related Content
Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos
Meta AI’s SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, outperforming existing models in promptable concept segmentation.
Fara-7B: An Efficient Agentic Small Language Model for Computer Use
Microsoft's Fara-7B achieves 38.4% success rate on WebTailBench, outperforming larger models in agentic computer tasks.
Mistral AI Releases Mistral Small 4: A 119B-Parameter MoE Model
Mistral Small 4 unifies instruct, reasoning, and multimodal tasks into a single 119B MoE model with 6B active parameters per token.