Computer Vision

32 articles in this category (Page 2 of 2)

AI NewsComputer VisionModel Optimization

FLUX.2: Black Forest Labs' Next-Gen Image Generator Demands 80GB VRAM for Inference

FLUX.2, Black Forest Labs' new image model, requires 80GB VRAM for inference and introduces architectural changes like single-text encoder and fused transformer blocks.

Nov 25, 2025

AI NewsComputer VisionOpen Source

Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines

Black Forest Labs launches FLUX.2, a 32B parameter model enabling 4MP image generation and editing with multi-reference support.

Nov 25, 2025

AI NewsComputer VisionAI Models

Fara-7B: An Efficient Agentic Small Language Model for Computer Use

Microsoft's Fara-7B achieves 38.4% success rate on WebTailBench, outperforming larger models in agentic computer tasks.

Nov 24, 2025

AI NewsComputer VisionAI Paper Summary

Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos

Meta AI’s SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, outperforming existing models in promptable concept segmentation.

Nov 20, 2025

AI NewsGenerative AIComputer Vision

Learn-to-Steer: NVIDIA’s 2025 Spatial Fix for Text-to-Image Diffusion

NVIDIA’s Learn-to-Steer framework improves spatial reasoning in text-to-image models, achieving gains on GenEval and T2I-CompBench.

Nov 19, 2025

AI NewsComputer VisionNLP

Brand Tagging with VLMs

Two-stage pipeline using SigLIP-2 and LLaVA-OneVision-1.5 achieves 95% confidence in logo verification on 44s video clips

Nov 15, 2025

AI NewsApplicationsComputer Vision

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

Baidu’s ERNIE-4.5-VL-28B-A3B-Thinking achieves 3B active parameters per token with 30B total parameters, outperforming larger models on multimodal benchmarks.

Nov 11, 2025

AI NewsArtificial IntelligenceComputer Vision

Spatial Supersensing as the Core Capability for Multimodal AI Systems

This article explores how spatial supersensing is emerging as a critical capability for multimodal AI systems, focusing on the Cambrian-S model and the VSI Super benchmark for evaluating long-video spatial reasoning.

Nov 7, 2025