MiniMax-M2.1 → gpt-oss-20b-heretic-bf16 SVD-LoRA Adapter (Adaptive Rank, max Δ ratio 0.35)

中文版本請見:**README_ZH.md**

This repository provides a PEFT LoRA adapter for kldzj/gpt-oss-20b-heretic-bf16, distilled from MiniMaxAI/MiniMax-M2.1 using weight-delta SVD-LoRA distillation (cross-architecture).

  • Base model (student / required): kldzj/gpt-oss-20b-heretic-bf16
  • Teacher model (reference): MiniMaxAI/MiniMax-M2.1
  • Artifact: LoRA adapter (PEFT) — not a full merged model
  • Scope: Attention projection modules (q_proj, k_proj, v_proj, o_proj)

What is this?

This adapter approximates the teacher→student weight delta (Δ) with low-rank factors and stores them as LoRA matrices. It is designed for cross-architecture distillation where teacher/student differ in depth and/or hidden dimensions.


Quickstart (Transformers + PEFT)

This is an adapter. You must load the base model first.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_id = "kldzj/gpt-oss-20b-heretic-bf16"
adapter_id = "win10/gpt-oss-20b-heretic-distilled-MiniMax-M2.1-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain knowledge distillation in 5 bullet points."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)

with torch.no_grad():
    out = model.generate(
        inputs.to(model.device),
        max_new_tokens=512,
        do_sample=False,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Optional: Merge the adapter into the base weights

If you need a single merged checkpoint for inference:

from peft import PeftModel

merged = model.merge_and_unload()
merged.save_pretrained("./merged_model", safe_serialization=True)
tokenizer.save_pretrained("./merged_model")

Reproducibility (build command)

The adapter was produced with a command equivalent to:

python universal_distill.py \
  --teacher ./MiniMax-M2-bf16 \
  --student ./gpt-oss-20b-heretic-bf16 \
  --output ./MiniMax-M2.1-to-gpt-oss-20b-heretic-bf16-lora-adaptive-delta-ratio-0.35 \
  --svd-mode full \
  --energy-threshold 0.95 \
  --projection-rank 1024 \
  --min-rank 1024 \
  --max-rank 2880 \
  --interp-mode lsq \
  --svd-rand-iter 2 \
  --svd-rand-oversamples 8 \
  --calib-format alpaca \
  --calib-alpaca-template classic \
  --calib-max-samples 128 \
  --calib-max-length 32768 \
  --calib-batch-size 2 \
  --calib-save ./calib_stats_rombo-code_bagel_hermes-dataset.safetensors \
  --calib-data ./data/code_bagel_hermes-2.5.json \
  --calib-mode rms \
  --mixed-precision \
  --max-delta-ratio 0.35

Compatibility notes

  • This adapter targets the exact module naming / shapes of kldzj/gpt-oss-20b-heretic-bf16.
  • If you use a different gpt-oss-20b variant, it must be shape-compatible (otherwise adapter load will fail).

Source models


License

Please follow the license and usage terms of the base model and teacher model as listed on their Hugging Face pages. This repository only provides an adapter; downstream usage must remain compliant with upstream terms.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for win10/gpt-oss-20b-heretic-distilled-MiniMax-M2.1-lora

Adapter
(1)
this model