Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 20
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 20
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 4
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 4
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 1
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 1
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 6
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 6
yujunzhou/SFT_Advanced_Risk_Summarization_Qwen3-4B-Base Text Generation • 4B • Updated Dec 14, 2025 • 2
yujunzhou/SFT_Advanced_Risk_Summarization_Qwen3-4B-Base Text Generation • 4B • Updated Dec 14, 2025 • 2