File size: 2,021 Bytes

---
license: cc-by-nc-nd-4.0
base_model:
- GAIR/Abel-7B-002
language:
- en
tags:
- math
- reasoning
- dpo
- open-llm-leaderboard
repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO
model-index:
- name: DylanDeep-Core-8B-DPO
  results:
  - task:
      type: text-generation
      name: Math Reasoning
    dataset:
      name: gsm8k
      type: gsm8k
    metrics:
    - name: GSM8K (8-Shot Majority Vote)
      type: accuracy
      value: 84.84
datasets:
- openai/gsm8k
metrics:
- accuracy
---
cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'
---
license: other
license_name: cc-by-nc-nd-4.0-with-llama2
license_link: LICENSE
base_model: GAIR/Abel-7B-002
tags:
- math
- reasoning
- gsm8k
- dpo
- rlhf
datasets:
- gsm8k
metrics:
- accuracy
---

# DylanDeep-Core-8B-DPO

A math reasoning model achieving **84.84% on GSM8K** through preference optimization.

## Model Details

- **Base**: Abel-7B-002 (LLaMA-2 architecture)
- **Method**: SFT + DPO with counterfactual reasoning
- **Evaluation**: 8-shot majority voting

## Performance

| Model | GSM8K Accuracy |
|-------|----------------|
| Abel-7B-002 (base) | 79.08% |
| + SFT | 84.46% |
| + DPO | **84.84%** |

## Training

Fine-tuned with LoRA adapters using a two-stage approach:
1. Supervised fine-tuning on GSM8K training set
2. DPO on 3,334 preference pairs with counterfactual probing
   

## Training Code

[DylanDeep-Core-8B-DPO](https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO)

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:

Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD