glorgao
/

SelectiveDPO-Mistral-7B-SFT-UFBinarized

Text Generation

text-generation-inference

Model card Files Files and versions

This model is fine-tuned from the HuggingFaceH4/mistral-7b-sft-beta model using SelectiveDPO on the Ultrafeedback_binarized dataset.

For the recipe to reproduce this model, please visit our GitHub page.

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F32

·

Model tree for glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized

Base model

mistralai/Mistral-7B-v0.1

Finetuned

HuggingFaceH4/mistral-7b-sft-beta

Finetuned

(153)

this model

Dataset used to train glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized

Collection including glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized

SelectiveDPO

Released models trained by Selective DPO. • 5 items • Updated 20 days ago

Paper for glorgao/SelectiveDPO-Mistral-7B-SFT-UFBinarized

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

Paper • 2502.09650 • Published Feb 11, 2025