Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47
Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M Text Classification • 8B • Updated Jul 6, 2025 • 2.81k • 20