Rename README.yaml to README.md

3765dcb verified 6 months ago

2.02 kB

license: cc-by-nc-nd-4.0
base_model:
  - GAIR/Abel-7B-002
language:
  - en
tags:
  - math
  - reasoning
  - dpo
  - open-llm-leaderboard
repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO
model-index:
  - name: DylanDeep-Core-8B-DPO
    results:
      - task:
          type: text-generation
          name: Math Reasoning
        dataset:
          name: gsm8k
          type: gsm8k
        metrics:
          - name: GSM8K (8-Shot Majority Vote)
            type: accuracy
            value: 84.84
datasets:
  - openai/gsm8k
metrics:
  - accuracy

cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD'

license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy

DylanDeep-Core-8B-DPO

A math reasoning model achieving 84.84% on GSM8K through preference optimization.

Model Details

Base: Abel-7B-002 (LLaMA-2 architecture)
Method: SFT + DPO with counterfactual reasoning
Evaluation: 8-shot majority voting

Performance

Model	GSM8K Accuracy
Abel-7B-002 (base)	79.08%
+ SFT	84.46%
+ DPO	84.84%

Training

Fine-tuned with LoRA adapters using a two-stage approach:

Supervised fine-tuning on GSM8K training set
DPO on 3,334 preference pairs with counterfactual probing

Training Code

DylanDeep-Core-8B-DPO

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO")
License
This model is released under CC BY-NC-ND 4.0 with the following conditions:

Non-commercial use only
No derivatives without permission
Attribution required
Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD