Liquid AI LFM2.5-350M: High-Density Edge Intelligence via 28T Token Training

Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

Liquid AI has launched LFM2.5-350M, a compact model that challenges traditional scaling laws through extreme intelligence density. This architecture was pre-trained on a massive 28 trillion tokens, achieving an unprecedented 80,000:1 token-to-parameter ratio.

Why This Matters

While frontier models focus on increasing parameter counts to achieve intelligence, LFM2.5-350M addresses the “memory wall” bottleneck by optimizing for edge devices with limited compute. By utilizing a hybrid backbone of Linear Input-Varying Systems (LIVs) and Grouped Query Attention (GQA), it provides a 32k context window while maintaining a memory footprint as low as 81MB on mobile GPUs, proving that parameter count is not the sole determinant of performance.

Key Insights

Hybrid LIV/GQA Architecture: The model uses 10 Double-Gated LIV Convolution Blocks for sequence processing and 6 GQA blocks for high-precision retrieval, reducing KV cache overhead (2026).
Extreme Intelligence Density: Training on 28T tokens allows this 350M parameter model to outperform competitors twice its size on benchmarks like IFEval, where it scored 76.96 (2026).
High-Speed Inference: On a single NVIDIA H100, the architecture supports throughput of 40.4K output tokens per second, making it ideal for real-time agentic tasks (2026).
Edge-Specific Optimization: Low-memory inference is achieved via RunAnywhere Q4, requiring only 169MB on Snapdragon 8 Elite NPUs and 81MB on Snapdragon GPUs (2026).
Instruction Following Specialist: With a GPQA Diamond score of 30.64 and high IFEval results, the model is tuned for tool use and structured data extraction (2026).

Practical Applications

Use Case: High-volume data extraction and real-time classification on Raspberry Pi 5 using Cactus Engine int8 with a 300MB memory footprint. Pitfall: Attempting complex mathematics or creative writing, which the model documentation explicitly advises against.
Use Case: Local agentic tasks and function calling on Snapdragon 8 Elite mobile devices using RunAnywhere Q4 for low-latency tool use. Pitfall: Utilizing the model for complex coding tasks where larger reasoning models remain necessary.

References:

https://www.marktechpost.com/2026/03/31/liquid-ai-released-lfm2-5-350m-a-compact-350m-parameter-model-trained-on-28t-tokens-with-scaled-reinforcement-learning/

On This Page

Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Alibaba Qwen 3.5 Medium Series: High-Efficiency MoE Models with 1M Context

NVIDIA AI Introduces TiDAR: A Hybrid Diffusion Autoregressive Architecture For High Throughput LLM Inference

Liquid AI Releases LFM2-ColBERT-350M: A Compact Late Interaction Model for Multilingual Cross-Lingual Retrieval