Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published 22 days ago • 75
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21 • 83
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 89
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published Jun 13 • 20
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17 • 44
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 36
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published Feb 26 • 10