Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning Paper • 2603.15611 • Published 3 days ago • 10
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published 5 days ago • 9
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty Paper • 2603.15500 • Published 3 days ago • 11
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning Paper • 2603.12529 • Published 7 days ago • 18
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 3 days ago • 127
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published 6 days ago • 138
Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context Paper • 2603.15653 • Published 13 days ago • 4
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published 10 days ago • 12
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 9 days ago • 11
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning Paper • 2603.10160 • Published 9 days ago • 25
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published 15 days ago • 204
Automatic Generation of High-Performance RL Environments Paper • 2603.12145 • Published 7 days ago • 6
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training Paper • 2603.10444 • Published 9 days ago • 10
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges Paper • 2603.11863 • Published 8 days ago • 6
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published 9 days ago • 8
Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training Paper • 2603.12246 • Published 7 days ago • 4