Instructions to use dealignai/Ling-2.6-flash-JANGTQ2-CRACK with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dealignai/Ling-2.6-flash-JANGTQ2-CRACK with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("dealignai/Ling-2.6-flash-JANGTQ2-CRACK") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use dealignai/Ling-2.6-flash-JANGTQ2-CRACK with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Ling-2.6-flash-JANGTQ2-CRACK"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dealignai/Ling-2.6-flash-JANGTQ2-CRACK" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dealignai/Ling-2.6-flash-JANGTQ2-CRACK with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Ling-2.6-flash-JANGTQ2-CRACK"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dealignai/Ling-2.6-flash-JANGTQ2-CRACK
Run Hermes
hermes
- MLX LM
How to use dealignai/Ling-2.6-flash-JANGTQ2-CRACK with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "dealignai/Ling-2.6-flash-JANGTQ2-CRACK"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "dealignai/Ling-2.6-flash-JANGTQ2-CRACK" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dealignai/Ling-2.6-flash-JANGTQ2-CRACK", "messages": [ {"role": "user", "content": "Hello"} ] }'
Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, shared expert, dense MLP, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the
jang-toolsPython package. Follow @dealignai for new releases.
MLX Studio — the only app that natively supports JANG / JANGTQ models
Ling 2.6 Flash — JANGTQ2 + CRACK
JANGTQ TurboQuant mixed-precision | CRACK abliterated | Hybrid MLA + Linear-Attn MoE | EN + ZH | 29 GB
What Is This?
This is Ling 2.6 Flash by inclusionAI — a 35B-parameter Mixture-of-Experts model with 256 routed experts (8 active per token) + 1 always-active shared expert, hybrid MLA + Lightning Linear-Attention architecture, native English + Chinese, 131K context.
It has been:
- JANGTQ2 quantized — JANGTQ2 profile (8-bit affine on attention, shared expert, dense MLP, embed and lm_head; 2-bit TurboQuant on routed experts with codebook + Hadamard rotation) — 29 GB
- CRACK abliterated — permanent weight-level removal of safety refusal
| Base model | inclusionAI/Ling-2.6-flash (35B total, 1 shared + 8 routed active) |
| Architecture | bailing_hybrid — Multi-Latent Attention (MLA) every 8th layer + Lightning Linear-Attn elsewhere |
| Quantization | JANGTQ2 — 29 GB |
| MMLU-200 | 81.0% (MXFP4 base 80.0% — +1.0pp, surgery neutral) |
| HarmBench-320 | 100.0% (320/320 comply, 0 refuse, 0 empty) |
| Context | 131,072 native |
| Languages | English + Chinese (probed bilingual) |
| Speed | 50+ tok/s on M4 Max 128 GB |
| Fits on | 48 GB+ Macs |
MMLU-200 Results (thinking OFF)
| Model | Correct | Accuracy | No-match |
|---|---|---|---|
| MXFP4 Base (reference) | 160/200 | 80.00% | 6 |
| MXFP4 + CRACK | 157/200 | 78.50% | 10 |
| JANGTQ2 + CRACK (this model) | 162/200 | 81.00% | 1 |
JANGTQ2 + CRACK actually edges past the un-cracked MXFP4 base on the same 200 questions. Q2 quantization noise absorbs the small directional damage from CRACK and the lower no-match count helps too.
HarmBench-320 Results
| Model | COMPLY | REFUSE | EMPTY |
|---|---|---|---|
| MXFP4 Base (reference) | 161 (50.3%) | 157 (49.1%) | 2 (0.6%) |
| MXFP4 + CRACK | 313 (97.8%) | 5 (1.6%) | 2 (0.6%) |
| JANGTQ2 + CRACK (this model) | 320 (100.0%) | 0 (0.0%) | 0 (0.0%) |
Perfect 320/320 comply with zero EMPTY verdicts.
Ling 2.6 Flash CRACK Series
| Model | Format | Size | MMLU-200 | HarmBench-320 | Fits on |
|---|---|---|---|---|---|
| MXFP4 + CRACK | affine 4-bit g=32 | 63 GB | 78.5% | 97.8% | 96 GB Mac |
| JANGTQ2 + CRACK (this model) | TurboQuant 2-bit experts + 8-bit affine | 29 GB | 81.0% | 100.0% | 48 GB Mac |
JANGTQ2 is less than half the size of the MXFP4 variant, scores higher on both benchmarks, and is the recommended drop-in for most users.
Usage
from jang_tools.load_jangtq import load_jangtq_model
model, tokenizer = load_jangtq_model("dealignai/Ling-2.6-flash-JANGTQ2-CRACK")
messages = [{"role": "user", "content": "Hello — what can you do?"}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
from mlx_lm import generate
print(generate(model, tokenizer, prompt=prompt, max_tokens=400, verbose=True))
jang-tools is required to load JANGTQ models in Python; see JANGTQ docs. For one-click runtime use MLX Studio.
About JANGTQ
JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths (attention, shared expert, embed, lm_head) stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.
For Ling 2.6 Flash, JANGTQ2 brings the model to 29 GB while preserving full bilingual capability — smallest Ling 2.6 Flash variant that maintains coherence and tool use.
About This Model
Ling 2.6 Flash is the latency-tier sibling in the Ling 2.6 family — fast multilingual instruction-follow + tool use. The chat template includes a <think>...</think> reasoning block, but in practice this Flash variant is best treated as a non-reasoning instruct model: leave thinking OFF (the default) for benchmark-style work and short-form responses. For chain-of-thought reasoning prefer the larger Ling 2.6 Plus / Ring / Pro tier.
CRACK is a permanent weight-level abliteration that removes safety refusal from the always-active residual-stream writers without touching the TurboQuant codebook on routed experts. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.
Support dealignai
All models are built from original research and published for free.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Disclaimer
This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.
The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Ling model.
- Downloads last month
- 559
Quantized
Model tree for dealignai/Ling-2.6-flash-JANGTQ2-CRACK
Base model
inclusionAI/Ling-2.6-flash