File size: 2,021 Bytes
2a80698 7e0b849 8fb303f 2a80698 59a9130 2a80698 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ---
license: cc-by-nc-nd-4.0
base_model:
- GAIR/Abel-7B-002
language:
- en
tags:
- math
- reasoning
- dpo
- open-llm-leaderboard
repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO
model-index:
- name: DylanDeep-Core-8B-DPO
results:
- task:
type: text-generation
name: Math Reasoning
dataset:
name: gsm8k
type: gsm8k
metrics:
- name: GSM8K (8-Shot Majority Vote)
type: accuracy
value: 84.84
datasets:
- openai/gsm8k
metrics:
- accuracy
---
cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'
---
license: other
license_name: cc-by-nc-nd-4.0-with-llama2
license_link: LICENSE
base_model: GAIR/Abel-7B-002
tags:
- math
- reasoning
- gsm8k
- dpo
- rlhf
datasets:
- gsm8k
metrics:
- accuracy
---
# DylanDeep-Core-8B-DPO
A math reasoning model achieving **84.84% on GSM8K** through preference optimization.
## Model Details
- **Base**: Abel-7B-002 (LLaMA-2 architecture)
- **Method**: SFT + DPO with counterfactual reasoning
- **Evaluation**: 8-shot majority voting
## Performance
| Model | GSM8K Accuracy |
|-------|----------------|
| Abel-7B-002 (base) | 79.08% |
| + SFT | 84.46% |
| + DPO | **84.84%** |
## Training
Fine-tuned with LoRA adapters using a two-stage approach:
1. Supervised fine-tuning on GSM8K training set
2. DPO on 3,334 preference pairs with counterfactual probing
## Training Code
[DylanDeep-Core-8B-DPO](https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO)
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:
Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD |