DylanDeep-Core-8B / README.md
dylxnmyl's picture
Rename README.yaml to README.md
3765dcb verified
metadata
license: cc-by-nc-nd-4.0
base_model:
  - GAIR/Abel-7B-002
language:
  - en
tags:
  - math
  - reasoning
  - dpo
  - open-llm-leaderboard
repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO
model-index:
  - name: DylanDeep-Core-8B-DPO
    results:
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: gsm8k
          type: gsm8k
        metrics:
          - name: GSM8K (8-Shot Majority Vote)
            type: accuracy
            value: 84.84
datasets:
  - openai/gsm8k
metrics:
  - accuracy

cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'

license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy

DylanDeep-Core-8B-DPO

A math reasoning model achieving 84.84% on GSM8K through preference optimization.

Model Details

  • Base: Abel-7B-002 (LLaMA-2 architecture)
  • Method: SFT + DPO with counterfactual reasoning
  • Evaluation: 8-shot majority voting

Performance

Model GSM8K Accuracy
Abel-7B-002 (base) 79.08%
+ SFT 84.46%
+ DPO 84.84%

Training

Fine-tuned with LoRA adapters using a two-stage approach:

  1. Supervised fine-tuning on GSM8K training set
  2. DPO on 3,334 preference pairs with counterfactual probing

Training Code

DylanDeep-Core-8B-DPO

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:

Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD