Categories: Artificial Intelligence / GUI Agents

MiningLamp Mano Unveils New Era of GUI Agents with Global SOTA on Mind2Web and OSWorld Benchmarks

MiningLamp Mano Unveils New Era of GUI Agents with Global SOTA on Mind2Web and OSWorld Benchmarks

Introduction: A New Era for GUI Agents

In 2025, the AI community is buzzing about agents that can learn to operate like humans on mobile devices and desktop GUI environments. MiningLamp Technology, a leading Chinese enterprise in enterprise-scale large models and data intelligence, has announced a major milestone. Its specialized GUI large model, Mano, has achieved state-of-the-art (SOTA) performance on two respected benchmarks: Mind2Web and OSWorld. The company asserts that through innovative online reinforcement learning and automated data acquisition, Mano inaugurates a scalable, self-evolving paradigm for GUI agent development.

Mind2Web Benchmark: Precision in Complex Web Environments

Mind2Web tests an agent’s ability to locate target elements within 137 websites and execute multi-step sequences across over 2,350 real-world tasks. Mano’s core strength is summarized as:

  • High Element Accuracy (Ele.Acc) and robust Step Success Rate (Step SR) — demonstrating superior capability to identify and interact with interface elements across dynamic DOM structures.
  • Operational F1 Score (Op.F1) that matches or slightly exceeds leading models, reflecting precision in turning observations into actionable operation sequences.

The technical report indicates Mano not only outperforms peers but also validates its ability to maintain performance as web interfaces evolve, a key requirement for truly useful GUI agents.

OSWorld-Verified Benchmark: Desktop-Scale Mastery

OSWorld-Verified presents a tougher challenge on the desktop side, spanning 369 cross-application tasks across 10 categories, including browsers and office suites. Mano’s Foundation E2E GUI & Specialized Model evaluation delivered a notable boost: a 41.6%±0.7% success rate, surpassing models such as Qwen, GUI-Owl, and OpenCua. This achievement underscores Mano’s versatility in translating high-level instructions into reliable desktop actions.

Key Innovations Driving Mano’s Success

Highlight One: First Proposal of Online Reinforcement Learning

Mano introduces an online reinforcement learning (RL) paradigm for GUI interaction — a departure from the prevalent offline RL approach that relies on pre-collected datasets. By launching an automated explorer for real-time data acquisition, Mano continuously learns from current system interactions. The result is a dynamic balance between exploring new actions to gain insights and exploiting established strategies for reliable execution.

MiningLamp also built a simulation-environment pool with Browser Use Agent (BUA) and Computer Use Agent (CUA) to expose the model to diverse real-world Web GUI scenarios. An online sampling plus offline filtering workflow helps manage trajectory quality, filtering noisy data and gradually adjusting task difficulty to keep learning efficient and robust.

Highlight Two: Intelligent Exploration and Real-World Trajectory Capture

Large models excel at understanding broad instructions but often falter when breaking down complex, multi-step tasks into executable actions. Mano addresses this with an automated data-collection framework that scales with the complexity of GUI tasks. A scalable virtual environment cluster simulates varied interactive scenarios, allowing Mano to automatically generate target lists, prune rarely used functions, and provide structured context for exploration.

On the data side, a Chrome plugin named Mano-C extracts web-interactive elements with their spatial coordinates and semantic attributes. For desktop environments, A11y Tree parsing and OmniParseV2 filtering extend coverage. Semantic labels, functional descriptions, and interaction categories are generated by large models to form richly annotated supervision data. A prompt-based exploration module and explicit constraints prevent path loops, while a DFS-based exploration strategy captures screenshots and interaction sequences. The loop continues until exploration depth limits are reached, ensuring continual improvement.

Implications for the AI Industry

Mano’s achievements suggest a viable path toward generalizable GUI agents capable of robust operation across web and desktop interfaces. By combining online RL with automated data acquisition and a scalable environment, Mano demonstrates how GUI intelligence can evolve with real-world feedback rather than solely offline pretraining. This has implications for enterprise automation, software testing, accessibility, and the broader push toward autonomous software agents that can understand and manipulate user interfaces with human-like competence.

What’s Next

As Mano continues to mature, expect further enhancements in sample efficiency, broader OS coverage, and deeper integration with user workflows. The industry will be watching to see whether Mano’s online RL paradigm becomes a standard practice for GUI-driven AI systems, enabling more reliable, adaptable, and autonomous agents for day-to-day computing tasks.