Liquid AI LFM2.5-350M: High-Density Edge Intelligence via 28T Token Training
These articles are AI-generated summaries. Please check the original sources for full details.
Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning
Liquid AI has launched LFM2.5-350M, a compact model that challenges traditional scaling laws through extreme intelligence density. This architecture was pre-trained on a massive 28 trillion tokens, achieving an unprecedented 80,000:1 token-to-parameter ratio.
Why This Matters
While frontier models focus on increasing parameter counts to achieve intelligence, LFM2.5-350M addresses the “memory wall” bottleneck by optimizing for edge devices with limited compute. By utilizing a hybrid backbone of Linear Input-Varying Systems (LIVs) and Grouped Query Attention (GQA), it provides a 32k context window while maintaining a memory footprint as low as 81MB on mobile GPUs, proving that parameter count is not the sole determinant of performance.
Key Insights
- Hybrid LIV/GQA Architecture: The model uses 10 Double-Gated LIV Convolution Blocks for sequence processing and 6 GQA blocks for high-precision retrieval, reducing KV cache overhead (2026).
- Extreme Intelligence Density: Training on 28T tokens allows this 350M parameter model to outperform competitors twice its size on benchmarks like IFEval, where it scored 76.96 (2026).
- High-Speed Inference: On a single NVIDIA H100, the architecture supports throughput of 40.4K output tokens per second, making it ideal for real-time agentic tasks (2026).
- Edge-Specific Optimization: Low-memory inference is achieved via RunAnywhere Q4, requiring only 169MB on Snapdragon 8 Elite NPUs and 81MB on Snapdragon GPUs (2026).
- Instruction Following Specialist: With a GPQA Diamond score of 30.64 and high IFEval results, the model is tuned for tool use and structured data extraction (2026).
Practical Applications
- Use Case: High-volume data extraction and real-time classification on Raspberry Pi 5 using Cactus Engine int8 with a 300MB memory footprint. Pitfall: Attempting complex mathematics or creative writing, which the model documentation explicitly advises against.
- Use Case: Local agentic tasks and function calling on Snapdragon 8 Elite mobile devices using RunAnywhere Q4 for low-latency tool use. Pitfall: Utilizing the model for complex coding tasks where larger reasoning models remain necessary.
References:
Continue reading
Next article
32 Tickets, 7 Stories, 1 Video: How the Building Agent Fixed 13 Critical Infrastructure Bugs in Sprint 11
Related Content
Alibaba Qwen 3.5 Medium Series: High-Efficiency MoE Models with 1M Context
Alibaba's Qwen 3.5 Medium series introduces the 35B-A3B model, which outperforms its 235B predecessor using only 3B active parameters and a 1M token context window.
NVIDIA AI Introduces TiDAR: A Hybrid Diffusion Autoregressive Architecture For High Throughput LLM Inference
NVIDIA's TiDAR achieves 5.91x speedup on 8B models while maintaining autoregressive quality.
Liquid AI Releases LFM2-ColBERT-350M: A Compact Late Interaction Model for Multilingual Cross-Lingual Retrieval
Liquid AI introduces LFM2-ColBERT-350M, a 350M-parameter late interaction retriever optimized for multilingual and cross-lingual search, offering high accuracy and fast inference speeds.