Instructions to use AI-Joe-git/Darwin-9B-Opus-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AI-Joe-git/Darwin-9B-Opus-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AI-Joe-git/Darwin-9B-Opus-GGUF", dtype="auto") - llama-cpp-python
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AI-Joe-git/Darwin-9B-Opus-GGUF", filename="Darwin-9B-Opus-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Use Docker
docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AI-Joe-git/Darwin-9B-Opus-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI-Joe-git/Darwin-9B-Opus-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
- SGLang
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AI-Joe-git/Darwin-9B-Opus-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI-Joe-git/Darwin-9B-Opus-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AI-Joe-git/Darwin-9B-Opus-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI-Joe-git/Darwin-9B-Opus-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Ollama:
ollama run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
- Unsloth Studio new
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AI-Joe-git/Darwin-9B-Opus-GGUF to start chatting
- Pi new
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Docker Model Runner:
docker model run hf.co/AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
- Lemonade
How to use AI-Joe-git/Darwin-9B-Opus-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AI-Joe-git/Darwin-9B-Opus-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Darwin-9B-Opus-GGUF-Q4_K_M
List all available models
lemonade list
Darwin-9B-Opus
Qwen3.5 Dense 9B | Reasoning | Chain-of-Thought | 131K Context | 201 Languages | BF16 | Apache 2.0
Technical Definitions
| Term | Definition | Measurement |
|---|---|---|
| Model MRI | Layer-level profiling of tensor health indicators | L2 norm, Shannon entropy, std per tensor across all layers |
| LayerMRI.compare_layers | Per-tensor A vs B quality comparison yielding optimal ratio_b | score = entropy * 0.5 + std * 0.3 + clamp(norm, 100) * 0.002 per model; ratio_b = score_b / (score_a + score_b) |
| MRI-Guided Merge | Per-tensor merge ratios derived from parent diagnostics (70% MRI + 30% genome) | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
| DARE-TIES | Merge algorithm: random binary mask on delta, then weighted addition | merged = A + (B - A) * random_mask(density) * ratio |
| Transplant A / B | When MRI ratio falls below 0.05 or above 0.95, one parent is used entirely | No interpolation — direct tensor copy |
| Evolutionary Search | CMA-ES population evolution over genome space (ratio, attn, ffn, embed, density_a, density_b) | Phase 1: 200 steps heuristic proxy, Phase 2: 10 steps real benchmark |
Overview
Darwin-9B-Opus is a 9B dense parameter reasoning model created using Darwin V5. Both parent models share the identical Qwen3.5-9B architecture — the Mother is a LoRA SFT on the same base, not a different architecture.
| Role | Model | Training |
|---|---|---|
| Father | Qwen/Qwen3.5-9B | Original pre-training + RLHF |
| Mother | Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled | LoRA SFT with text-only Claude 4.6 Opus reasoning chains |
How Darwin V5 Works
Darwin V5 does not use mergekit or any external merge library. It implements DARE-TIES merge directly via PyTorch tensor operations, with MRI-guided per-layer ratios. The algorithm is inspired by the DARE-TIES method but re-implemented from scratch to support per-tensor diagnostic-guided ratios.
Merge Implementation (actual code logic)
# For each tensor pair (A, B) across all safetensor shards:
ta = model_a[key] # Father tensor
tb = model_b[key] # Mother tensor
# 1. MRI diagnoses both tensors
diag_a = LayerMRI.diagnose_tensor(ta) # {norm, entropy, std}
diag_b = LayerMRI.diagnose_tensor(tb) # {norm, entropy, std}
# 2. Quality score comparison determines ratio_b
score_a = diag_a["entropy"] * 0.5 + diag_a["std"] * 0.3 + min(diag_a["norm"], 100) * 0.002
score_b = diag_b["entropy"] * 0.5 + diag_b["std"] * 0.3 + min(diag_b["norm"], 100) * 0.002
mri_ratio = score_b / (score_a + score_b) # Higher = Mother is better
# 3. Final ratio = MRI 70% + evolutionary genome 30%
final_ratio = mri_ratio * 0.7 + genome_type_ratio * 0.3
# 4. DARE-TIES merge with per-tensor ratio
mask = torch.rand_like(tb) < density_b
delta = (tb - ta) * mask
merged = (ta + delta * final_ratio).bfloat16()
Pipeline
Phase 0: Model MRI
For every tensor in both parents, measure:
- L2 norm (layer energy)
- Shannon entropy (weight distribution uniformity)
- Standard deviation (activation spread)
Compare A vs B quality scores -> per-tensor ratio prescription
Phase 1: Evolutionary Search (200 steps, heuristic proxy)
Population of 20 genomes (ratio, attn, ffn, embed, density_a, density_b)
Fitness: heuristic score based on genome balance + differentiation
Selection -> SLERP crossover -> Gaussian mutation
Phase 2: Real Merge + Benchmark (10 steps)
Top genomes from Phase 1 undergo actual tensor merge
Each merge: MRI prescription (70%) + genome ratio (30%)
Fitness: real benchmark score (ARC-Challenge)
Best model selected and auto-uploaded
Phase 3: Health Check
Layer-by-layer importance comparison: child vs both parents
Detect interference (child >> parents) or function loss (parents >> child)
What Makes This Different from Standard Merging
| Capability | Standard DARE-TIES | Darwin V5 |
|---|---|---|
| Implementation | mergekit library call | Direct PyTorch tensor operations |
| Ratio selection | Uniform ratio across all tensors | Per-tensor ratio from MRI diagnosis |
| Pre-merge analysis | None | Tensor-level norm/entropy/std profiling |
| Ratio determination | Human-set or grid search | MRI 70% + evolutionary genome 30% |
| Post-merge validation | Benchmark score only | Layer-by-layer child vs parents comparison |
| Transplant support | No | ratio < 0.05 -> use A entirely, ratio > 0.95 -> use B entirely |
| Failure diagnosis | "Score went down" | Per-tensor quality delta identifies problematic layers |
Model Specifications
| Architecture | Qwen3.5 Dense (Gated DeltaNet hybrid) |
| Total Parameters | 9B |
| Precision | BF16 |
| Context Length | 131,072 native |
| Languages | 201 |
| Thinking | <think> tag chain-of-thought reasoning |
| License | Apache 2.0 |
Hardware Requirements
| Setup | VRAM | Status |
|---|---|---|
| BF16 Full Precision | ~20 GB | |
| NVIDIA RTX 4090 24GB | 24 GB | Comfortable |
| NVIDIA A100 40GB | 40 GB | Very comfortable |
| NVIDIA T4 16GB | 16 GB | Requires quantization |
Usage
Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(
"FINAL-Bench/Darwin-9B-Opus",
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"FINAL-Bench/Darwin-9B-Opus",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
SGLang
python -m sglang.launch_server \
--model-path FINAL-Bench/Darwin-9B-Opus \
--tp 1 \
--mem-fraction-static 0.90 \
--context-length 32768 \
--trust-remote-code
vLLM
vllm serve FINAL-Bench/Darwin-9B-Opus \
--trust-remote-code \
--enforce-eager
Evolution Details
| Engine | Darwin V5 (Evolutionary Merge + Layer-Level Diagnostics) |
| Merge Method | DARE-TIES (direct PyTorch implementation, no external library) |
| MRI Integration | Per-tensor diagnosis: norm, entropy, std -> ratio prescription |
| Ratio Formula | final_ratio = mri_ratio * 0.7 + genome_ratio * 0.3 |
| Evolution | Phase 1: 200 steps proxy + Phase 2: 10 steps real benchmark |
| Best Score | 0.8508 (ARC-Challenge) |
| Infrastructure | 4 x NVIDIA H100 NVL (100GB each) |
Acknowledgements
- Korean Government — GPU Support Program research grant
- Qwen Team — Qwen3.5 base architecture
- Jackrong — Claude 4.6 Opus Reasoning Distilled model
- DARE-TIES algorithm — Yadav et al., 2023 (re-implemented, not library-dependent)
Built By
| Developer | VIDRAFT |
| Engine | Darwin V5 |
| Base Architecture | Qwen3.5-9B |
Citation
@misc{vidraft_darwin_9b_opus,
title = {Darwin-9B-Opus: Diagnostic-Guided Evolutionary Merge},
author = {VIDRAFT},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-Opus}}
}
- Downloads last month
- 978
4-bit
8-bit
Model tree for AI-Joe-git/Darwin-9B-Opus-GGUF
Paper for AI-Joe-git/Darwin-9B-Opus-GGUF
Evaluation results
- Accuracy on GPQA Diamondself-reported90.000