Last update: 10 Dec. 2025

Introduction

This is a reasoning enhanced version of Motif-2-12.7B-Instruct. Detailed information will be released later.

Evaluation

Benchmark	Evaluation setting	Motif-2-12.7B	Motif-2-12.7B
		Instruct	Reasoning
MMLU	0-shot	86.11	84.07
MMLU-Redux	-	90.02	88.89
BBH	0-shot	85.78	78.34
GPQA-Diamond	0-shot, CoT	63.6	70
GSM8K	0-shot, CoT	96.13	95.53
MATH	0-shot	97	95.07
MBPP	3-shot	91	88.9
LiveBench 2024-11-25	-	33.8	49.9
IFEval	strict prompt	75.78	79.11
IFEval	0-shot	76.52	81.89
MATH-500	-	96.8	99.3
AIME24	-	72.3	88.3
AIME25	-	63.6	80
ZebraLogic	-	69.5	77
BFCL v3	-	55.34	60.2
LiveCodeBench v5 (2024.10 - 2025.2)	-	50.03	65
LiveCodeBench v5	0-shot, CoT	61.66	60.1
HumanEval	0-shot	93.2	93.2
Average	-	75.45	79.71

Safetensors

Model size

13B params

Tensor type

F32

Base model

Finetuned

(2)

this model