-
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Paper • 2602.19895 • Published • 13 -
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning
Paper • 2602.01062 • Published -
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
Paper • 2602.06717 • Published • 74 -
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
Paper • 2601.22582 • Published
Lei Xia
cszdwxm
AI & ML interests
None yet
Recent Activity
updated a collection 14 days ago
XXPO/XXRL updated a collection 14 days ago
XXPO/XXRL updated a collection 14 days ago
XXPO/XXRLOrganizations
XXPO/XXRL
-
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Paper • 2602.19895 • Published • 13 -
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning
Paper • 2602.01062 • Published -
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
Paper • 2602.06717 • Published • 74 -
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
Paper • 2601.22582 • Published
todo
models 0
None public yet
datasets 0
None public yet