RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation Paper • 2603.25804 • Published 7 days ago • 25
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era Paper • 2601.07526 • Published Jan 12 • 23
Dynamic Scaling of Unit Tests for Code Reward Modeling Paper • 2501.01054 • Published Jan 2, 2025 • 16
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 26