glm5.1-distill

yasserrmd/glm5.1-distill is a 1.2B parameter instruction-tuned chat model built on top of LiquidAI/LFM2.5-1.2B-Base. It is supervised-fine-tuned (SFT) on a 50k subset of Jackrong/GLM-5.1-Reasoning-1M-Cleaned, a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.

The goal is to bring some of the conversational reasoning behavior of larger GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it can run comfortably on a single consumer GPU, on edge devices, or via quantized runtimes such as ONNX, GGUF, or MLX.

Note: This is an independent community fine-tune. It is not affiliated with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).


Model summary

Property Value
Architecture LFM2 (hybrid conv + attention)
Parameters ~1.2B
Tensor dtype BF16
Context length 4096 (trained at 2048 with packing)
Base model LiquidAI/LFM2.5-1.2B-Base
Fine-tuning method LoRA SFT (merged back to base)
Trainer Unsloth + TRL SFTTrainer
Chat template LFM2 / ChatML-style (`<
License Apache 2.0

Intended use

This model is designed for:

  • General assistant-style chat
  • Lightweight reasoning, step-by-step answers, and explanations
  • On-device and edge deployments where a 1B class model is appropriate
  • A starting checkpoint for further domain-specific fine-tuning

It is not a safety-aligned, production-ready assistant on its own. Treat its output as that of a small distilled student model: it can be confidently wrong, especially on long-horizon math, code correctness, current events, and anything safety-critical.

Out of scope

  • Medical, legal, financial, or other high-stakes advice
  • Any setting that requires guaranteed factuality
  • Generating content that violates the Apache 2.0 license terms or the upstream LFM2.5 base model license

Quickstart (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "yasserrmd/glm5.1-distill"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

streamer = TextStreamer(tokenizer, skip_prompt=True)

_ = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.1,
    top_k=50,
    top_p=0.1,
    repetition_penalty=1.05,
    streamer=streamer,
)

Recommended sampling

The base LFM2.5 family is sensitive to sampling settings. The following defaults (inherited from Liquid AI's reference settings) work well:

Use case temperature top_k top_p repetition_penalty
Factual / short answers 0.1 50 0.1 1.05
Creative / longer text 0.7 50 0.9 1.10
Code / structured output 0.2 40 0.9 1.05

Chat template

The tokenizer ships with a ChatML-style template. A two-turn example serializes to:

<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Hey there!<|im_end|>

Always use tokenizer.apply_chat_template(..., add_generation_prompt=True) at inference time. Do not hand-roll the prompt.


Training details

Data

  • Source: Jackrong/GLM-5.1-Reasoning-1M-Cleaned, main config
  • Slice: first 50,000 rows of the train split
  • Format: ShareGPT-style multi-turn conversations, normalized via unsloth.chat_templates.standardize_data_formats
  • Loss masking: train_on_responses_only so only assistant tokens contribute to the loss

LoRA configuration

Hyperparameter Value
Rank r 16
lora_alpha 16
lora_dropout 0
Bias none
Target modules q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
Gradient checkpointing unsloth
Random seed 3407

SFT hyperparameters

Hyperparameter Value
Epochs 1
Per-device batch size 32
Gradient accumulation 1
Effective batch size 32
Packing True
Max sequence length 2048
Optimizer adamw_torch
Learning rate 2e-5
LR scheduler linear
Warmup steps 50
Weight decay 0.01
Precision BF16
Seed 3407

Merge & export

After SFT, the LoRA adapters were merged into the base weights using Unsloth's push_to_hub_merged(..., save_method="merged_16bit"). The repository contains the resulting full BF16 model, not adapters.

Hardware

Trained on a single GPU using Unsloth's optimized kernels. End-to-end training memory and time are dominated by the 50k-row, packed-2048 setup described above.


Evaluation

No formal benchmark scores are reported for this checkpoint yet. It has been smoke-tested on:

  • General Q&A (e.g. "Why is the sky blue?")
  • Short creative writing prompts
  • Multi-turn instruction following

Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or MT-Bench are left as future work. Contributions via the HF community tab are welcome.


Limitations and biases

  • Inherits all limitations and biases of the LFM2.5 base model and of the GLM-5.1-derived training data.
  • 1.2B parameters is small. Expect weaker performance than 7B+ chat models on hard reasoning, long context, and code generation.
  • The training corpus is predominantly English. Other languages will work to varying degrees but are not the target.
  • The model can hallucinate facts confidently. Verify anything important.

ONNX version

An ONNX export of this model is available at:

yasserrmd/glm5.1-distill-onnx

It can be used with onnxruntime and optimum for CPU and accelerated inference. See that repository's README for usage details.


Citation

If you use this checkpoint, please cite the upstream work as well:

@misc{yasserrmd_glm51_distill_2026,
  title  = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},
  author = {Mohamed Yasser},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}
}

And the base model and dataset:

  • LiquidAI, LFM2.5-1.2B-Base, 2025.
  • Jackrong, GLM-5.1-Reasoning-1M-Cleaned, Hugging Face Datasets.

Acknowledgements

Made with Unsloth

Downloads last month
362
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yasserrmd/glm5.1-distill

Finetuned
(30)
this model
Quantizations
2 models

Dataset used to train yasserrmd/glm5.1-distill