metadata
license: cc-by-nc-nd-4.0
base_model:
- GAIR/Abel-7B-002
language:
- en
tags:
- math
- reasoning
- dpo
- open-llm-leaderboard
repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO
model-index:
- name: DylanDeep-Core-8B-DPO
results:
- task:
type: text-generation
name: Math Reasoning
dataset:
name: gsm8k
type: gsm8k
metrics:
- name: GSM8K (8-Shot Majority Vote)
type: accuracy
value: 84.84
datasets:
- openai/gsm8k
metrics:
- accuracy
cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'
license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy
DylanDeep-Core-8B-DPO
A math reasoning model achieving 84.84% on GSM8K through preference optimization.
Model Details
- Base: Abel-7B-002 (LLaMA-2 architecture)
- Method: SFT + DPO with counterfactual reasoning
- Evaluation: 8-shot majority voting
Performance
| Model | GSM8K Accuracy |
|---|---|
| Abel-7B-002 (base) | 79.08% |
| + SFT | 84.46% |
| + DPO | 84.84% |
Training
Fine-tuned with LoRA adapters using a two-stage approach:
- Supervised fine-tuning on GSM8K training set
- DPO on 3,334 preference pairs with counterfactual probing
Training Code
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:
Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD