Liquid AI LFM2.5-VL-450M: Sub-250ms Edge Inference and Bounding Box Prediction

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference

Liquid AI has launched LFM2.5-VL-450M, an optimized vision-language model designed for direct edge hardware deployment. The model achieves a latency of 242ms for 512x512 images on NVIDIA Jetson Orin, enabling real-time visual reasoning without cloud dependency.

Why This Matters

Traditional vision-language models (VLMs) typically require massive GPU clusters and cloud infrastructure, creating significant barriers for real-time edge applications like robotics or wearables where latency and privacy are paramount. LFM2.5-VL-450M addresses these constraints by fitting a sophisticated multimodal architecture into a 450M-parameter footprint, providing a viable alternative to cloud-reliant models. While many small models sacrifice spatial reasoning, this release introduces bounding box prediction with a RefCOCO-M score of 81.28. This allows engineers to move beyond simple image captioning toward structured, grounded scene understanding in compute-constrained environments.

Key Insights

Sub-250ms inference on NVIDIA Jetson Orin (2026) enables 4 FPS video stream processing for full vision-language understanding.
Bounding box prediction capabilities achieved an 81.28 RefCOCO-M score, a leap from zero in the previous LFM2-VL-450M version.
SigLIP2 NaFlex shape-optimized 86M vision encoder combined with a tiling strategy allows native resolution processing up to 512x512 without distortion.
Multilingual understanding improved to 68.09 on MMMB (2026), supporting eight languages including Arabic, Chinese, and Japanese for global edge deployments.
Pre-training data was scaled from 10T to 28T tokens, followed by reinforcement learning to enhance instruction following (MM-IFEval score of 45.00).

Practical Applications

Industrial Automation: Use LFM2.5-VL-450M on Jetson Orin for real-time tracking of inventory flow and worker actions. Pitfall: Using the model for fine-grained OCR tasks where it is noted to be less effective.
Wearable Devices: Deploy on Snapdragon 8 Elite for smart glasses providing local semantic scene understanding. Pitfall: Over-relying on the model for knowledge-intensive queries better suited for larger LLMs.
Retail Compliance: Implement on mini-PC APUs for automated shelf monitoring and visual search. Pitfall: Disabling thumbnail encoding during tiling, which removes global scene context for the model.

References:

https://www.marktechpost.com/2026/04/11/liquid-ai-releases-lfm2-5-vl-450m-a-450m-parameter-vision-language-model-with-bounding-box-prediction-multilingual-support-and-sub-250ms-edge-inference/

On This Page

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Meta AI's EUPE: A <100M Parameter Universal Vision Encoder Rivaling Specialists

"From Pixels to Predictions": Production-Grade Edge AI Pipelines With CameraX and TFLite on Android

FLUX.2: Black Forest Labs' Next-Gen Image Generator Demands 80GB VRAM for Inference