Skip to main content
← All Tags

Computer Vision

32 articles in this category (Page 2 of 2)

AI NewsComputer VisionModel Optimization

FLUX.2: Black Forest Labs' Next-Gen Image Generator Demands 80GB VRAM for Inference

FLUX.2, Black Forest Labs' new image model, requires 80GB VRAM for inference and introduces architectural changes like single-text encoder and fused transformer blocks.

Read more
AI NewsComputer VisionOpen Source

Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines

Black Forest Labs launches FLUX.2, a 32B parameter model enabling 4MP image generation and editing with multi-reference support.

Read more
AI NewsComputer VisionAI Models

Fara-7B: An Efficient Agentic Small Language Model for Computer Use

Microsoft's Fara-7B achieves 38.4% success rate on WebTailBench, outperforming larger models in agentic computer tasks.

Read more
AI NewsComputer VisionAI Paper Summary

Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos

Meta AI’s SAM 3 achieves 75-80% of human performance on the SA-Co benchmark, outperforming existing models in promptable concept segmentation.

Read more
AI NewsGenerative AIComputer Vision

Learn-to-Steer: NVIDIA’s 2025 Spatial Fix for Text-to-Image Diffusion

NVIDIA’s Learn-to-Steer framework improves spatial reasoning in text-to-image models, achieving gains on GenEval and T2I-CompBench.

Read more
AI NewsComputer VisionNLP

Brand Tagging with VLMs

Two-stage pipeline using SigLIP-2 and LLaVA-OneVision-1.5 achieves 95% confidence in logo verification on 44s video clips

Read more
AI NewsApplicationsComputer Vision

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

Baidu’s ERNIE-4.5-VL-28B-A3B-Thinking achieves 3B active parameters per token with 30B total parameters, outperforming larger models on multimodal benchmarks.

Read more
AI NewsArtificial IntelligenceComputer Vision

Spatial Supersensing as the Core Capability for Multimodal AI Systems

This article explores how spatial supersensing is emerging as a critical capability for multimodal AI systems, focusing on the Cambrian-S model and the VSI Super benchmark for evaluating long-video spatial reasoning.

Read more