Instructions to use AI-Joe-git/Darwin-9B-Opus-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AI-Joe-git/Darwin-9B-Opus-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AI-Joe-git/Darwin-9B-Opus-GGUF", dtype="auto")

llama-cpp-python

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AI-Joe-git/Darwin-9B-Opus-GGUF",
	filename="Darwin-9B-Opus-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Use Docker

docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AI-Joe-git/Darwin-9B-Opus-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-Joe-git/Darwin-9B-Opus-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

SGLang

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AI-Joe-git/Darwin-9B-Opus-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-Joe-git/Darwin-9B-Opus-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AI-Joe-git/Darwin-9B-Opus-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-Joe-git/Darwin-9B-Opus-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Ollama:
```
ollama run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
```

Unsloth Studio new

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting

Pi new

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Docker Model Runner:
```
docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
```

Lemonade

How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Darwin-9B-Opus-GGUF-Q4_K_M

List all available models

lemonade list

Darwin-9B-Opus

Qwen3.5 Dense 9B | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0

Technical Definitions

Term	Definition	Measurement
Model MRI	Layer-level profiling of tensor health indicators	L2 norm, Shannon entropy, std per tensor across all layers
LayerMRI.compare_layers	Per-tensor A vs B quality comparison yielding optimal ratio_b	score = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b)
MRI-Guided Merge	Per-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome)	final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
DARE-TIES	Merge algorithm: random binary mask on delta, then weighted addition	merged = A + (B - A) * random_mask(density) * ratio
Transplant A / B	When MRI ratio falls below 0.05 or above 0.95, one parent is used entirely	No interpolation — direct tensor copy
Evolutionary Search	CMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b)	Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark

Overview

Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture — the Mother is a LoRA SFT on the same base, not a different architecture.

Role	Model	Training
Father	Qwen/Qwen3.5-9B	Original pre-training + RLHF
Mother	Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled	LoRA SFT with text-only Claude 4.6 Opus reasoning chains

How Darwin V5 Works

Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.

Merge Implementation (actual code logic)

# For each tensor pair (A, B) across all safetensor shards:
ta = model_a[key]       # Father tensor
tb = model_b[key]       # Mother tensor

# 1. MRI diagnoses both tensors
diag_a = LayerMRI.diagnose_tensor(ta)  # {norm, entropy, std}
diag_b = LayerMRI.diagnose_tensor(tb)  # {norm, entropy, std}

# 2. Quality score comparison determines ratio_b
score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
mri_ratio = score_b / (score_a + score_b)  # Higher = Mother is better

# 3. Final ratio = MRI 70% + evolutionary genome 30%
final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3

# 4. DARE-TIES merge with per-tensor ratio
mask = torch.rand_like(tb) < density_b
delta = (tb - ta) * mask
merged = (ta + delta * final_ratio).bfloat16()

Pipeline

Phase 0: Model MRI
  For every tensor in both parents, measure:
    - L2 norm (layer energy)
    - Shannon entropy (weight distribution uniformity)
    - Standard deviation (activation spread)
  Compare A vs B quality scores -> per-tensor ratio prescription

Phase 1: Evolutionary Search (200 steps, heuristic proxy)
  Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
  Fitness: heuristic score based on genome balance + differentiation
  Selection -> SLERP crossover -> Gaussian mutation

Phase 2: Real Merge + Benchmark (10 steps)
  Top genomes from Phase 1 undergo actual tensor merge
  Each merge: MRI prescription (70%) + genome ratio (30%)
  Fitness: real benchmark score (ARC-Challenge)
  Best model selected and auto-uploaded

Phase 3: Health Check
  Layer-by-layer importance comparison: child vs both parents
  Detect interference (child >> parents) or function loss (parents >> child)

What Makes This Different from Standard Merging

Capability	Standard DARE-TIES	Darwin V5
Implementation	mergekit library call	Direct PyTorch tensor operations
Ratio selection	Uniform ratio across all tensors	Per-tensor ratio from MRI diagnosis
Pre-merge analysis	None	Tensor-level norm/entropy/std profiling
Ratio determination	Human-set or grid search	MRI 70% + evolutionary genome 30%
Post-merge validation	Benchmark score only	Layer-by-layer child vs parents comparison
Transplant support	No	ratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely
Failure diagnosis	"Score went down"	Per-tensor quality delta identifies problematic layers

Model Specifications


Architecture	Qwen3.5 Dense (Gated DeltaNet hybrid)
Total Parameters	9B
Precision	BF16
Context Length	131,072 native
Languages	201
Thinking	`<think>` tag chain-of-thought reasoning
License	Apache 2.0

Hardware Requirements

Setup	VRAM	Status
BF16 Full Precision	~20 GB
NVIDIA RTX 4090 24GB	24 GB	Comfortable
NVIDIA A100 40GB	40 GB	Very comfortable
NVIDIA T4 16GB	16 GB	Requires quantization

Usage

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-9B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

SGLang

python -m sglang.launch_server \
  --model-path FINAL-Bench/Darwin-9B-Opus \
  --tp 1 \
  --mem-fraction-static 0.90 \
  --context-length 32768 \
  --trust-remote-code

vLLM

vllm serve FINAL-Bench/Darwin-9B-Opus \
  --trust-remote-code \
  --enforce-eager

Evolution Details


Engine	Darwin V5 (Evolutionary Merge + Layer-Level Diagnostics)
Merge Method	DARE-TIES (direct PyTorch implementation, no external library)
MRI Integration	Per-tensor diagnosis: norm, entropy, std -> ratio prescription
Ratio Formula	final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3
Evolution	Phase 1: 200 steps proxy + Phase 2: 10 steps real benchmark
Best Score	0.8508 (ARC-Challenge)
Infrastructure	4 x NVIDIA H100 NVL (100GB each)

Acknowledgements

Korean Government — GPU Support Program research grant
Qwen Team — Qwen3.5 base architecture
Jackrong — Claude 4.6 Opus Reasoning Distilled model
DARE-TIES algorithm — Yadav et al., 2023 (re-implemented, not library-dependent)

Built By


Developer	VIDRAFT
Engine	Darwin V5
Base Architecture	Qwen3.5-9B

Citation

@misc{vidraft_darwin_9b_opus,
  title        = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
  author       = {VIDRAFT},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
}

Downloads last month: 978

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for AI-Joe-git/Darwin-9B-Opus-GGUF

Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled

Qwen/Qwen3.5-9B

Merge model

this model

Paper for AI-Joe-git/Darwin-9B-Opus-GGUF

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Paper • 2311.03099 • Published Nov 6, 2023 • 33

Evaluation results

Accuracy on GPQA Diamond
self-reported

90.000