-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 181 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 31 -
Sparser Block-Sparse Attention via Token Permutation
Paper • 2510.21270 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2510.11696
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 181 -
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper • 2602.08354 • Published • 257 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 5
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 23 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 181 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 31 -
Sparser Block-Sparse Attention via Token Permutation
Paper • 2510.21270 • Published • 25
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 181 -
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper • 2602.08354 • Published • 257 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 5
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 33 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 10 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 23 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27