PyTorch
English
llama
K2-V2 / README.md
yukiontheiceberg's picture
Update README.md (#1)
e6fa89b verified
|
raw
history blame
5.7 kB
metadata
license: apache-2.0
language:
  - en

K2-V2

๐Ÿ“š Tech Report - ๐Ÿ“ Code - ๐Ÿข Project Page

K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.

k2-sft-aime

Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.

k2-base-gpqa

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")

prompt = "Explain why the derivative of sin(x) is cos(x)."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation Summary

Task / Model base mid-1 mid-2 mid-3 mid-4 Qwen2.5-72B Llama3.0-70B Llama3.1-70B Olmo3-32B
Architecture Dense Dense Dense Dense Dense Dense Dense Dense Dense
# Total Params 70B 70B 70B 70B 70B 72B 70B 70B 32B
# Activated Params 70B 70B 70B 70B 70B 72B 70B 70B 32B
General Tasks
MMLU 74.3 74.4 73.5 75.0 75.2 86.1 79.5 79.3 75.2
MMLU-Pro 43.7 46.8 48.1 59.8 57.0 58.1 52.8 53.8 49.6
BBH 68.4 79.8 81.1 82.2 83.2 86.3 82.2 82.1 77.6
HELLASWAG 87.8 86.9 86.6 86.6 86.0 87.6 88.0 85.0 84.8
WINOGRANDE 82.6 83.7 83.7 83.7 83.0 83.9 85.3 79.8 90.3
PIQA 84.2 84.0 83.3 82.9 83.1 83.5 84.6 84.3 85.6
TRUTHFULQA 54.0 54.9 55.1 55.8 53.9 60.5 45.6 49.7 54.9
Math & STEM Tasks
GPQA-DIAMOND 26.3 31.3 27.8 43.9 55.1 34.9 21.2 27.3 30.3
GSM8K 68.0 76.4 82.1 93.6 92.5 91.2 83.2 81.1 80.5
MATH 27.8 38.2 41.1 94.7 91.4 58.5 41.9 41.6 43.4
AIME 2025 0.0 17.6 25.1 53.2 46.9 1.7 0.1 0.2 14.7
ARC-CHALLENGE 64.9 66.4 66.4 66.0 66.3 72.4 69.2 64.9 65.4
Coding Tasks
MBPP 57.6 57.8 58.2 59.8 61.8 75.4 69.2 64.4 60.2
HUMANEVAL 50.0 51.2 53.7 54.3 54.3 54.3 42.1 50.6 36.0
Logic Puzzles
COUNTDOWN 1.3 53.3 53.1 35.9 75.6 6.0 1.0 0.5 23.2
KK-4 PEOPLE 4.8 44.9 68.0 64.5 92.9 26.1 4.2 7.6 42.4
KK-8 PEOPLE 0.5 23.2 41.3 51.6 82.8 5.7 1.1 1.3 13.0
ORDER-15 ITEMS 4.7 30.7 47.2 55.8 87.6 37.0 3.5 4.5 25.0
ORDER-30 ITEMS 0.0 0.3 3.0 34.1 40.3 0.7 0.2 0.1 0.6
Instruction Following
IFEVAL 17.4 26.2 28.5 34.5 26.7 40.3 15.1 17.4 13.2
Arabic
MMLU-Arabic 65.4 66.1 64.5 66.6 65.5 74.1 65.0 66.8 47.8

Please refer to our Tech Report for detailed evaluation results.


Datasets & Mixtures

K2 training is organized into three stages, each using a transparent, publicly released mixture:

Pretraining Mix

  • Large-scale natural text corpus (web, books, code, multilingual)
  • Balanced mixture optimized for stable scaling and broad knowledge
  • ~12T tokens

Mid-Training Mix

  • TxT360-Midas: reasoning-oriented + long-context extensions
  • Domain-focused sources: math, programming, scientific literature
  • Synthetic expansions where natural data is scarce

SFT Mix

All mixtures, filtering rules, and data sources are fully released for reproducibility.


Model Description

  • Model type: Language model with transformer architecture
  • Language(s) (NLP): English
  • License: Apache 2.0
Model Hyperparameter Value
Total Parameters 70B
Hidden Size 8,192
Intermediate Size (MLPs) 28,672
Number of Attention Heads 64
Number of Hidden Layers 80
RMSNorm ษ› 1e^-5
Max Pre-training Seq Length 8,192
Max Mid-training Seq Length 524,288
Vocab Size 250,000

Citation & Acknowledgment

If you use our dataset in your research, please cite our K2-V2 paper:

@misc{llm360@k2v2,
  title         = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
  author        = {K2 Team},
  year          = {2025},
  archivePrefix = {arXiv},
  eprint        = {XXXX.XXXXX},
  primaryClass  = {cs.CL}
}