--- license: cc-by-nc-nd-4.0 base_model: - GAIR/Abel-7B-002 language: - en tags: - math - reasoning - dpo - open-llm-leaderboard repo_url: https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO model-index: - name: DylanDeep-Core-8B-DPO results: - task: type: text-generation name: Math Reasoning dataset: name: gsm8k type: gsm8k metrics: - name: GSM8K (8-Shot Majority Vote) type: accuracy value: 84.84 datasets: - openai/gsm8k metrics: - accuracy --- cd ~/checkpoints/abel_combined_dpo_merged && cat > README.md << 'MODELCARD' --- license: other license_name: cc-by-nc-nd-4.0-with-llama2 license_link: LICENSE base_model: GAIR/Abel-7B-002 tags: - math - reasoning - gsm8k - dpo - rlhf datasets: - gsm8k metrics: - accuracy --- # DylanDeep-Core-8B-DPO A math reasoning model achieving **84.84% on GSM8K** through preference optimization. ## Model Details - **Base**: Abel-7B-002 (LLaMA-2 architecture) - **Method**: SFT + DPO with counterfactual reasoning - **Evaluation**: 8-shot majority voting ## Performance | Model | GSM8K Accuracy | |-------|----------------| | Abel-7B-002 (base) | 79.08% | | + SFT | 84.46% | | + DPO | **84.84%** | ## Training Fine-tuned with LoRA adapters using a two-stage approach: 1. Supervised fine-tuning on GSM8K training set 2. DPO on 3,334 preference pairs with counterfactual probing ## Training Code [DylanDeep-Core-8B-DPO](https://github.com/dmilstein-match/DylanDeep-Core-8B-DPO) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO") tokenizer = AutoTokenizer.from_pretrained("dylxnmyl/DylanDeep-Core-8B-DPO") License This model is released under CC BY-NC-ND 4.0 with the following conditions: Non-commercial use only No derivatives without permission Attribution required Additionally, this model inherits the LLaMA 2 Community License from its base model. Users must comply with both licenses. MODELCARD