Gelato-30B-A3B: A State-of-the-Art Grounding Model for GUI Computer-Use Tasks, Surpassing Computer Grounding Models like GTA1-32B
These articles are AI-generated summaries. Please check the original sources for full details.
Gelato-30B-A3B: A State-of-the-Art Grounding Model for GUI Computer-Use Tasks, Surpassing Computer Grounding Models like GTA1-32B
Researchers at ML Foundations introduced Gelato-30B-A3B, a 31B-parameter model that converts natural language instructions into precise click coordinates for GUI tasks. It achieves 63.88% accuracy on ScreenSpot Pro, surpassing GTA1-32B and even larger models like Qwen3-VL-235B-A22B-Instruct.
Why This Matters
Grounding models bridge natural language and GUI interactions, but prior systems often failed to align instructions with screen elements, leading to costly errors in automation. Gelato-30B-A3B’s 63.88% accuracy on ScreenSpot Pro (vs. GTA1-32B’s 56.97% in the same test) demonstrates a critical leap in reliability, reducing the need for manual corrections in agent workflows.
Key Insights
- “63.88% accuracy on ScreenSpot Pro, 2025”: Achieved through GRPO training on Click 100k dataset.
- “Click 100k dataset”: Merges 85+ professional app tutorials and filtered public datasets, ensuring precise bounding box annotations.
- “GRPO reinforcement learning”: Sparse rewards only trigger when predicted clicks match ground-truth boxes, boosting accuracy by +9 pp over unfiltered training.
Practical Applications
- Use Case: Agent frameworks (e.g., GTA1.5) using GPT-5 as a planner and Gelato-30B-A3B for grounding achieve 58.71% automated success on OS World tasks.
- Pitfall: Over-reliance on automated grounding may fail in UIs with dynamic layouts not represented in Click 100k.
References:
Continue reading
Next article
GlassWorm Malware Resurfaces in VS Code Extensions with Thousands of Installs
Related Content
Lux Surpasses Google Gemini CUA with 83.6% Accuracy on Online Mind2Web Benchmark
Lux, a new foundation computer use model by OpenAGI, achieves 83.6% accuracy on Online Mind2Web, outperforming Google Gemini CUA and others.
Qwen Team Releases Qwen3-Coder-Next: An Open-Weight Language Model
Qwen Team releases Qwen3-Coder-Next, an open-weight language model with 80B parameters, achieving performance comparable to models with 10-20× more active parameters.
Z.AI Releases GLM-5.1: 754B Open-Weight Agentic Model Sets New SWE-Bench Pro SOTA
Z.AI's GLM-5.1 achieves a state-of-the-art 58.4 on SWE-Bench Pro and sustains 8-hour autonomous execution for complex engineering tasks.