Reasoning
updated
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
•
2501.18585
•
Published
•
61
Improving Multi-Step Reasoning Abilities of Large Language Models with
Direct Advantage Policy Optimization
Paper
•
2412.18279
•
Published
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
•
2501.19324
•
Published
•
39
Dynamic Scaling of Unit Tests for Code Reward Modeling
Paper
•
2501.01054
•
Published
•
16
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
37
SuperCorrect: Supervising and Correcting Language Models with
Error-Driven Insights
Paper
•
2410.09008
•
Published
•
17
Subtle Errors Matter: Preference Learning via Error-injected
Self-editing
Paper
•
2410.06638
•
Published
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of
Language Models
Paper
•
2502.04404
•
Published
•
25
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
58
SIFT: Grounding LLM Reasoning in Contexts via Stickers
Paper
•
2502.14922
•
Published
•
32
PhysDreamer: Physics-Based Interaction with 3D Objects via Video
Generation
Paper
•
2404.13026
•
Published
•
24
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
61
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
•
2503.09516
•
Published
•
36
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
30
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
•
2503.18878
•
Published
•
119
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
•
2504.04718
•
Published
•
42
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
•
2504.08837
•
Published
•
43
Think on your Feet: Adaptive Thinking via Reinforcement Learning for
Social Agents
Paper
•
2505.02156
•
Published
•
18
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop
Reasoning with Transformers
Paper
•
2504.20752
•
Published
•
92
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
•
2505.07608
•
Published
•
82
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
188
Think Only When You Need with Large Hybrid-Reasoning Models
Paper
•
2505.14631
•
Published
•
20
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Paper
•
2505.13215
•
Published
•
29
Error Typing for Smarter Rewards: Improving Process Reward Models with
Error-Aware Hierarchical Supervision
Paper
•
2505.19706
•
Published
•
3
rStar2-Agent: Agentic Reasoning Technical Report
Paper
•
2508.20722
•
Published
•
117
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
•
2509.06160
•
Published
•
149