Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, shared expert, dense MLP, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.


MLX Studio

MLX Studio — the only app that natively supports JANG / JANGTQ models



Ling 2.6 Flash — JANGTQ2 + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Hybrid MLA + Linear-Attn MoE | EN + ZH | 29 GB

Ko-fi


What Is This?

This is Ling 2.6 Flash by inclusionAI — a 35B-parameter Mixture-of-Experts model with 256 routed experts (8 active per token) + 1 always-active shared expert, hybrid MLA + Lightning Linear-Attention architecture, native English + Chinese, 131K context.

It has been:

  1. JANGTQ2 quantized — JANGTQ2 profile (8-bit affine on attention, shared expert, dense MLP, embed and lm_head; 2-bit TurboQuant on routed experts with codebook + Hadamard rotation) — 29 GB
  2. CRACK abliterated — permanent weight-level removal of safety refusal
Base model inclusionAI/Ling-2.6-flash (35B total, 1 shared + 8 routed active)
Architecture bailing_hybrid — Multi-Latent Attention (MLA) every 8th layer + Lightning Linear-Attn elsewhere
Quantization JANGTQ2 — 29 GB
MMLU-200 81.0% (MXFP4 base 80.0% — +1.0pp, surgery neutral)
HarmBench-320 100.0% (320/320 comply, 0 refuse, 0 empty)
Context 131,072 native
Languages English + Chinese (probed bilingual)
Speed 50+ tok/s on M4 Max 128 GB
Fits on 48 GB+ Macs

MMLU-200 Results (thinking OFF)

Model Correct Accuracy No-match
MXFP4 Base (reference) 160/200 80.00% 6
MXFP4 + CRACK 157/200 78.50% 10
JANGTQ2 + CRACK (this model) 162/200 81.00% 1

JANGTQ2 + CRACK actually edges past the un-cracked MXFP4 base on the same 200 questions. Q2 quantization noise absorbs the small directional damage from CRACK and the lower no-match count helps too.


HarmBench-320 Results

Model COMPLY REFUSE EMPTY
MXFP4 Base (reference) 161 (50.3%) 157 (49.1%) 2 (0.6%)
MXFP4 + CRACK 313 (97.8%) 5 (1.6%) 2 (0.6%)
JANGTQ2 + CRACK (this model) 320 (100.0%) 0 (0.0%) 0 (0.0%)

Perfect 320/320 comply with zero EMPTY verdicts.


Ling 2.6 Flash CRACK Series

Model Format Size MMLU-200 HarmBench-320 Fits on
MXFP4 + CRACK affine 4-bit g=32 63 GB 78.5% 97.8% 96 GB Mac
JANGTQ2 + CRACK (this model) TurboQuant 2-bit experts + 8-bit affine 29 GB 81.0% 100.0% 48 GB Mac

JANGTQ2 is less than half the size of the MXFP4 variant, scores higher on both benchmarks, and is the recommended drop-in for most users.


Usage

from jang_tools.load_jangtq import load_jangtq_model

model, tokenizer = load_jangtq_model("dealignai/Ling-2.6-flash-JANGTQ2-CRACK")

messages = [{"role": "user", "content": "Hello — what can you do?"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

from mlx_lm import generate
print(generate(model, tokenizer, prompt=prompt, max_tokens=400, verbose=True))

jang-tools is required to load JANGTQ models in Python; see JANGTQ docs. For one-click runtime use MLX Studio.


About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths (attention, shared expert, embed, lm_head) stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.

For Ling 2.6 Flash, JANGTQ2 brings the model to 29 GB while preserving full bilingual capability — smallest Ling 2.6 Flash variant that maintains coherence and tool use.

About This Model

Ling 2.6 Flash is the latency-tier sibling in the Ling 2.6 family — fast multilingual instruction-follow + tool use. The chat template includes a <think>...</think> reasoning block, but in practice this Flash variant is best treated as a non-reasoning instruct model: leave thinking OFF (the default) for benchmark-style work and short-form responses. For chain-of-thought reasoning prefer the larger Ling 2.6 Plus / Ring / Pro tier.

CRACK is a permanent weight-level abliteration that removes safety refusal from the always-active residual-stream writers without touching the TurboQuant codebook on routed experts. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.


Support dealignai

All models are built from original research and published for free.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.


dealign.ai

Twitter · HF · Ko-fi


Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Ling model.

Downloads last month
559
Safetensors
Model size
8B params
Tensor type
U32
·
F16
·
U8
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/Ling-2.6-flash-JANGTQ2-CRACK

Finetuned
(5)
this model