Overview: Mano Sets a New Benchmark in GUI Intelligence
In 2025, the AI community has its eye on agents that can operate GUI interfaces with human-like finesse. MiningLamp Technology, a leader in enterprise-scale large models and data intelligence, announces a major leap forward with Mano—their specialized GUI Large Model. Mano has achieved state-of-the-art (SOTA) performance on two renowned benchmarks, Mind2Web and OSWorld, signaling a new paradigm for how GUI agents learn, adapt, and execute complex multi-step tasks in real-world environments.
Mind2Web Milestone: Seeing Clearly and Delivering Results
Mind2Web challenges an agent to locate target elements across 137 websites and more than 2,350 real-world tasks in dynamically changing DOM structures. Mano’s performance underscores its core strength: accurate perception coupled with reliable action. The technical report shows Mano leading in Element Accuracy (Ele.Acc) and Step Success Rate (Step SR), along with an Operational F1 score (Op.F1) that rivals the best models in the space. This combination demonstrates Mano’s capacity to convert perception into precise, multi-step operations across diverse web interfaces.
What the benchmarks reveal
Mano’s ability to pinpoint interface elements and execute sequences with high precision translates into more reliable GUI automation. The OSWorld-Verified results further validate Mano’s prowess on desktop-scale tasks. In this evaluation, Mano achieved a foundation End-to-End (E2E) GUI score that outperformed several competitors, signaling robust generalization from browser to desktop environments.
OSWorld-Verified: Desktop Realism at Scale
The OSWorld-Verified benchmark extends to 369 cross-app tasks across ten application categories, including browsers and office software. Mano’s performance boost—reaching a 41.6% success rate (±0.7%) in the Foundation E2E GUI & Specialized Model evaluation—places it ahead of models like Qwen, GUI-Owl, and OpenCua. These results matter because they demonstrate Mano’s readiness for practical desktop automation, not just controlled lab settings.
Technical Innovations Driving Mano’s Success
Highlight One: Online Reinforcement Learning — A First for GUI Interaction
Mano pioneers the “online reinforcement learning” paradigm in GUI interaction. Unlike traditional offline RL that relies on pre-collected data, Mano integrates an automated data explorer and a dynamic learning loop. This setup continuously gathers fresh interaction trajectories from real interfaces, balancing exploration with exploitation to continually refine decision policies. A simulation-environment pool with Browser Use Agent (BUA) and Computer Use Agent (CUA) enables Mano to experience diverse, realistic scenarios beyond offline datasets. The result is greater robustness across a wide range of Web GUI tasks.
Highlight Two: Intelligent Exploration — Real-World Trajectories, Automated Data
To scale GUI intelligence without prohibitive manual labeling, MiningLamp built an automated data-collection approach. A scalable virtual environment cluster simulates various interactive scenarios; for each target application, Mano generates a prioritized target list, filters low-usage functions, and provides clear contextual guidance. On the element side, a Chrome plugin named “Mano-C” extracts interactive elements with spatial and semantic data for web use, while desktop coverage uses a combination of A11y Tree parsing and OmniParseV2 filtering.
Data annotation assigns semantic labels, functional descriptions, and interaction categories to each element, forming semantically structured supervision data. A prompt-driven exploration module guides element selection, with explicit constraints to avoid path loops. A depth-first search (DFS) strategy captures screenshots and interaction data. A trajectory-evaluation mechanism then filters out lower-quality sequences, and the loop continues until optimal depth and coverage are achieved.
Why This Matters for the Future of GUI Agents
Mano’s dual-benchmark success and the online RL paradigm collectively push GUI agents toward practical, scalable operation across web and desktop environments. By learning from real-time interactions and generating high-quality training data automatically, Mano reduces the need for costly manual labeling and accelerates the deployment of robust GUI automation. For enterprises seeking reliable, adaptable agents capable of navigating complex interfaces, Mano marks a meaningful turning point in how intelligent GUI operation is developed and evaluated.
Looking Ahead
As Mind2Web and OSWorld benchmarks continue to evolve, Mano’s framework—combining online learning, automated data acquisition, and sophisticated element extraction—helps set the standard for next-generation GUI agents. The ongoing research and published technical reports invite developers and organizations to explore Mano’s capabilities, test in real-world scenarios, and contribute to a more capable, autonomous GUI ecosystem.