wangbing1416 's Collections Reasoning Papers
updated
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
Clipping Policy Optimization
Paper
• 2508.07629
• Published • 43
Less Is More: Training-Free Sparse Attention with Global Locality for
Efficient Reasoning
Paper
• 2508.07101
• Published • 14
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper
• 2508.03346
• Published • 8
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper
• 2508.08940
• Published • 27
Sample More to Think Less: Group Filtered Policy Optimization for
Concise Reasoning
Paper
• 2508.09726
• Published • 15
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published • 29
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning
Models to Ask for Information
Paper
• 2508.11252
• Published • 3
Deep Think with Confidence
Paper
• 2508.15260
• Published • 90
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
• 2508.14029
• Published • 119
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
Chain-of-Thought-based Reinforced Fine-Tuning
Paper
• 2508.15868
• Published • 3
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
• 2508.16949
• Published • 24
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published • 80
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published • 16
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
• 2508.19229
• Published • 20
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published • 61
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published • 25
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
• 2509.03059
• Published • 25
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published • 151
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published • 105
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
• 2509.06923
• Published • 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
• 2509.03646
• Published • 33
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published • 193
The Majority is not always right: RL training for solution aggregation
Paper
• 2509.06870
• Published • 15
The Choice of Divergence: A Neglected Key to Mitigating Diversity
Collapse in Reinforcement Learning with Verifiable Reward
Paper
• 2509.07430
• Published • 3
Reasoning-Aware GRPO using Process Mining
Paper
• 2510.25065
• Published • 42
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published • 229
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
• 2510.22543
• Published • 14
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published • 48
SemCoT: Accelerating Chain-of-Thought Reasoning through
Semantically-Aligned Implicit Tokens
Paper
• 2510.24940
• Published • 18
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large
Reasoning Models
Paper
• 2510.24794
• Published • 32
Data-Efficient RLVR via Off-Policy Influence Guidance
Paper
• 2510.26491
• Published • 11
Black-Box On-Policy Distillation of Large Language Models
Paper
• 2511.10643
• Published • 52
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Paper
• 2511.08577
• Published • 110
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published • 93
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance
Paper
• 2511.20233
• Published • 3
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
Paper
• 2512.05033
• Published • 17
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
Paper
• 2512.05325
• Published • 5
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision
Paper
• 2512.15489
• Published • 12
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published • 19
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published • 31
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published • 14
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
• 2601.06002
• Published • 58
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Paper
• 2512.20908
• Published • 29
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
• 2601.09088
• Published • 63
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
• 2601.14249
• Published • 13
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published • 41
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
• 2601.20614
• Published • 120
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper
• 2601.20218
• Published • 16
Memorization Dynamics in Knowledge Distillation for Language Models
Paper
• 2601.15394
• Published • 3
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning
Paper
• 2603.00889
• Published • 55
On-Policy Self-Distillation for Reasoning Compression
Paper
• 2603.05433
• Published • 7
Reasoning Models Struggle to Control their Chains of Thought
Paper
• 2603.05706
• Published • 36
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
Paper
• 2603.09906
• Published • 75
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
Paper
• 2603.09117
• Published • 9