OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth? Paper • 2507.19132 • Published Jul 25, 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28, 2025 • 31
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents Paper • 2510.24563 • Published Oct 28, 2025 • 23
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published Feb 2 • 36
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published 3 days ago • 20
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26, 2025 • 104
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16, 2025 • 8
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25, 2025 • 33
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19, 2025 • 46
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 71
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15, 2024 • 7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 52
OpenAgents: An Open Platform for Language Agents in the Wild Paper • 2310.10634 • Published Oct 16, 2023 • 9
A Survey on Spoken Language Understanding: Recent Advances and New Frontiers Paper • 2103.03095 • Published Mar 4, 2021
In-Context Learning for Few-Shot Dialogue State Tracking Paper • 2203.08568 • Published Mar 16, 2022 • 1
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation Paper • 2112.02721 • Published Dec 6, 2021