LightThinker++: From Reasoning Compression to Memory Management Paper • 2604.03679 • Published 8 days ago • 30
SkillX: Automatically Constructing Skill Knowledge Bases for Agents Paper • 2604.04804 • Published 6 days ago • 26
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities Paper • 2603.02578 • Published Mar 3 • 25
InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 36
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30, 2025 • 36
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29, 2025 • 30