Update README.md
Browse files
README.md
CHANGED
|
@@ -54,8 +54,7 @@ Fine-tuning datasets for this model are based on [Stack Exchange Paired](https:/
|
|
| 54 |
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
| 55 |
|
| 56 |
### Training Procedure
|
| 57 |
-
The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using the SFT model as the reference model.
|
| 58 |
-
It is trained to respond to prompts with the following template:
|
| 59 |
|
| 60 |
```
|
| 61 |
Question: <Query>
|
|
|
|
| 54 |
**DPO Training:** [https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl](https://huggingface.co/datasets/lvwerra/stack-exchange-paired/tree/main/data/rl)
|
| 55 |
|
| 56 |
### Training Procedure
|
| 57 |
+
The model was first fine-tuned on the Stack Exchange question and answer pairs and then fine-tuned via the DPO training procedure using the SFT model as the reference model. It is trained to respond to prompts with the following prompt template:
|
|
|
|
| 58 |
|
| 59 |
```
|
| 60 |
Question: <Query>
|