Alibaba Tongyi Lab Releases MAI-UI: A Foundation GUI Agent Family that Surpasses Gemini 2.5 Pro, Seed1.8 and UI-Tars-2 on AndroidWorld
These articles are AI-generated summaries. Please check the original sources for full details.
Alibaba Tongyi Lab Releases MAI-UI: A Foundation GUI Agent Family that Surpasses Gemini 2.5 Pro, Seed1.8 and UI-Tars-2 on AndroidWorld
Alibaba Tongyi Lab has introduced MAI-UI, a family of foundation GUI agents built on the Qwen3 VL model, ranging in size from 2B to 235B parameters. This system achieves state-of-the-art results in GUI grounding and mobile navigation, surpassing existing models like Gemini 2.5 Pro, Seed1.8, and UI-Tars-2 on the AndroidWorld benchmark.
Why This Matters
Current GUI agents often struggle with real-world complexity, lacking native user interaction, tool integration, and privacy considerations. Ideal models assume perfect data and consistent environments, but practical applications require handling ambiguous instructions, dynamic app interfaces, and sensitive user data – failures in these areas can lead to unusable applications and significant development rework.
Key Insights
- 76.7% success on AndroidWorld: MAI-UI’s largest variant achieved this score, exceeding competitors.
- Self-Evolving Data Pipeline: Improves navigation robustness by perturbing task parameters and filtering low-quality trajectories.
- Device-Cloud Collaboration: Enables privacy-sensitive operations to remain on-device while leveraging cloud-based models for complex tasks.
Working Example
# Example of a simplified action output from MAI-UI
action = {
"type": "click",
"element_id": "com.example.app:id/submit_button",
"coordinates": (540, 1800)
}
# Illustrative code for executing the action (simplified)
def execute_action(action):
if action["type"] == "click":
# Simulate clicking the element
print(f"Clicking element with ID: {action['element_id']} at {action['coordinates']}")
elif action["type"] == "text_input":
# Simulate entering text
print(f"Entering text: {action['text']} into element: {action['element_id']}")
Practical Applications
- Automated Customer Support: A mobile app using MAI-UI could automatically resolve customer issues by navigating the app interface and performing actions on behalf of the user.
- Pitfall: Relying solely on static datasets for training can lead to brittle agents that fail when app interfaces change or new app versions are released.
References:
Continue reading
Next article
AWS Account Best Practices: Secure Your AWS Account Before It's Too Late
Related Content
Lux Surpasses Google Gemini CUA with 83.6% Accuracy on Online Mind2Web Benchmark
Lux, a new foundation computer use model by OpenAGI, achieves 83.6% accuracy on Online Mind2Web, outperforming Google Gemini CUA and others.
Google DeepMind Researchers Introduce Evo-Memory Benchmark and ReMem Framework for Experience Reuse in LLM Agents
Google DeepMind's Evo-Memory benchmark boosts LLM agent performance with 0.65 exact match accuracy on Gemini 2.5 Flash.
Self-Evolving AI Agents: JiuwenClaw Launches with Autonomous Skill Optimization
OpenJiuwen Community releases JiuwenClaw, a self-evolving AI agent that uses hierarchical memory and local environment takeover to solve task execution bottlenecks.