iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 27 days ago • 3 • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 20 days ago • 7 • 3
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6 • 2