-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 84 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Emmanuel Sugutt
Sugutt
AI & ML interests
Reinforcement learning
Transformer models
Recent Activity
updated
a model
about 1 hour ago
Sugutt/whisper-kalenjin-small-revised
published
a model
1 day ago
Sugutt/whisper-kalenjin-small-revised
updated
a model
4 months ago
Sugutt/whisper-kalenjin-large