Skip to main content

On This Page

Liquid AI LFM2.5-VL-450M: Sub-250ms Edge Inference and Bounding Box Prediction

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference

Liquid AI has launched LFM2.5-VL-450M, an optimized vision-language model designed for direct edge hardware deployment. The model achieves a latency of 242ms for 512x512 images on NVIDIA Jetson Orin, enabling real-time visual reasoning without cloud dependency.

Why This Matters

Traditional vision-language models (VLMs) typically require massive GPU clusters and cloud infrastructure, creating significant barriers for real-time edge applications like robotics or wearables where latency and privacy are paramount. LFM2.5-VL-450M addresses these constraints by fitting a sophisticated multimodal architecture into a 450M-parameter footprint, providing a viable alternative to cloud-reliant models. While many small models sacrifice spatial reasoning, this release introduces bounding box prediction with a RefCOCO-M score of 81.28. This allows engineers to move beyond simple image captioning toward structured, grounded scene understanding in compute-constrained environments.

Key Insights

  • Sub-250ms inference on NVIDIA Jetson Orin (2026) enables 4 FPS video stream processing for full vision-language understanding.
  • Bounding box prediction capabilities achieved an 81.28 RefCOCO-M score, a leap from zero in the previous LFM2-VL-450M version.
  • SigLIP2 NaFlex shape-optimized 86M vision encoder combined with a tiling strategy allows native resolution processing up to 512x512 without distortion.
  • Multilingual understanding improved to 68.09 on MMMB (2026), supporting eight languages including Arabic, Chinese, and Japanese for global edge deployments.
  • Pre-training data was scaled from 10T to 28T tokens, followed by reinforcement learning to enhance instruction following (MM-IFEval score of 45.00).

Practical Applications

  • Industrial Automation: Use LFM2.5-VL-450M on Jetson Orin for real-time tracking of inventory flow and worker actions. Pitfall: Using the model for fine-grained OCR tasks where it is noted to be less effective.
  • Wearable Devices: Deploy on Snapdragon 8 Elite for smart glasses providing local semantic scene understanding. Pitfall: Over-relying on the model for knowledge-intensive queries better suited for larger LLMs.
  • Retail Compliance: Implement on mini-PC APUs for automated shelf monitoring and visual search. Pitfall: Disabling thumbnail encoding during tiling, which removes global scene context for the model.

References:

Continue reading

Next article

Mastering Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim

Related Content