ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 3 days ago • 98
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 6 days ago • 31
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published 13 days ago • 30
EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning Paper • 2603.12698 • Published 29 days ago • 1
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published 23 days ago • 66
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published 25 days ago • 94
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published 25 days ago • 94
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published 23 days ago • 66
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 Text Generation • 124B • Updated about 16 hours ago • 435k • 324
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction Paper • 2602.13294 • Published Feb 9 • 13