VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 4 days ago • 4
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 3 days ago • 11
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published 4 days ago • 11
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 9 days ago • 23
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 3 days ago • 75
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published 19 days ago • 28
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 4 days ago • 41
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands Paper • 2512.24965 • Published 25 days ago • 41
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 21 days ago • 42
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 26 days ago • 62
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 19 days ago • 99
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow Paper • 2512.24766 • Published 25 days ago • 8
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 30 days ago • 28
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 63
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published Dec 11, 2025 • 22
Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published Dec 11, 2025 • 18