DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning Paper • 2510.02341 • Published Sep 27 • 2
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter2 Text Generation • 8B • Updated Jun 25 • 10 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO Text Generation • 841k • Updated Aug 7 • 7 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-DRIFT-iter2-RPO Text Generation • 841k • Updated Aug 7 • 7
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter2-RPO Text Generation • 841k • Updated Aug 7 • 6
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter2 Text Generation • 841k • Updated Jul 30 • 7
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter2 Text Generation • 841k • Updated Jul 30 • 7
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-SPIN-iter2 Text Generation • 841k • Updated Jul 30 • 4
AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-iterDPO-iter2 Text Generation • 8B • Updated Jun 30 • 11 • 1
AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-DRIFT-iter2 Text Generation • 8B • Updated Jun 29 • 9 • 1
AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-SPIN-iter2 Text Generation • 8B • Updated Jun 30 • 8 • 1