MiniMax-M2.1 → gpt-oss-20b-heretic-bf16 SVD-LoRA Adapter (Adaptive Rank, max Δ ratio 0.35)

中文版本請見：**README_ZH.md**

This repository provides a PEFT LoRA adapter for kldzj/gpt-oss-20b-heretic-bf16, distilled from MiniMaxAI/MiniMax-M2.1 using weight-delta SVD-LoRA distillation (cross-architecture).

Base model (student / required): kldzj/gpt-oss-20b-heretic-bf16
Teacher model (reference): MiniMaxAI/MiniMax-M2.1
Artifact: LoRA adapter (PEFT) — not a full merged model
Scope: Attention projection modules (q_proj, k_proj, v_proj, o_proj)

What is this?

This adapter approximates the teacher→student weight delta (Δ) with low-rank factors and stores them as LoRA matrices. It is designed for cross-architecture distillation where teacher/student differ in depth and/or hidden dimensions.

Quickstart (Transformers + PEFT)

This is an adapter. You must load the base model first.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_id = "kldzj/gpt-oss-20b-heretic-bf16"
adapter_id = "win10/gpt-oss-20b-heretic-distilled-MiniMax-M2.1-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain knowledge distillation in 5 bullet points."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)

with torch.no_grad():
    out = model.generate(
        inputs.to(model.device),
        max_new_tokens=512,
        do_sample=False,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Optional: Merge the adapter into the base weights

If you need a single merged checkpoint for inference:

from peft import PeftModel

merged = model.merge_and_unload()
merged.save_pretrained("./merged_model", safe_serialization=True)
tokenizer.save_pretrained("./merged_model")

Reproducibility (build command)

The adapter was produced with a command equivalent to:

python universal_distill.py \
  --teacher ./MiniMax-M2-bf16 \
  --student ./gpt-oss-20b-heretic-bf16 \
  --output ./MiniMax-M2.1-to-gpt-oss-20b-heretic-bf16-lora-adaptive-delta-ratio-0.35 \
  --svd-mode full \
  --energy-threshold 0.95 \
  --projection-rank 1024 \
  --min-rank 1024 \
  --max-rank 2880 \
  --interp-mode lsq \
  --svd-rand-iter 2 \
  --svd-rand-oversamples 8 \
  --calib-format alpaca \
  --calib-alpaca-template classic \
  --calib-max-samples 128 \
  --calib-max-length 32768 \
  --calib-batch-size 2 \
  --calib-save ./calib_stats_rombo-code_bagel_hermes-dataset.safetensors \
  --calib-data ./data/code_bagel_hermes-2.5.json \
  --calib-mode rms \
  --mixed-precision \
  --max-delta-ratio 0.35

Compatibility notes

This adapter targets the exact module naming / shapes of kldzj/gpt-oss-20b-heretic-bf16.
If you use a different gpt-oss-20b variant, it must be shape-compatible (otherwise adapter load will fail).

Source models

Teacher model: https://huggingface.co/MiniMaxAI/MiniMax-M2.1
Base model: https://huggingface.co/kldzj/gpt-oss-20b-heretic-bf16

License

Please follow the license and usage terms of the base model and teacher model as listed on their Hugging Face pages. This repository only provides an adapter; downstream usage must remain compliant with upstream terms.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for win10/gpt-oss-20b-heretic-distilled-MiniMax-M2.1-lora

Base model

kldzj/gpt-oss-20b-heretic-bf16

Adapter

(1)

this model