oguzhanercan
's Collections
Finetuning Strategies
updated
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper
•
2507.21183
•
Published
•
14
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper
•
2507.21802
•
Published
•
17
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for
Advantage Diversity
Paper
•
2507.21848
•
Published
•
8
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
158
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
•
2508.16153
•
Published
•
160
DCPO: Dynamic Clipping Policy Optimization
Paper
•
2509.02333
•
Published
•
21
Towards a Unified View of Large Language Model Post-Training
Paper
•
2509.04419
•
Published
•
75
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward
Weighting
Paper
•
2509.11452
•
Published
•
13
Reinforcement Learning on Pre-Training Data
Paper
•
2509.19249
•
Published
•
67
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
•
2510.07242
•
Published
•
30
Reinforcing Diffusion Models by Direct Group Preference Optimization
Paper
•
2510.08425
•
Published
•
11
Free Lunch Alignment of Text-to-Image Diffusion Models without
Preference Image Pairs
Paper
•
2509.25771
•
Published
•
10
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts
LLMs
Paper
•
2511.07419
•
Published
•
26
Video Generation Models Are Good Latent Reward Models
Paper
•
2511.21541
•
Published
•
45