glm5.1-distill
yasserrmd/glm5.1-distill is a 1.2B parameter instruction-tuned chat model
built on top of LiquidAI/LFM2.5-1.2B-Base.
It is supervised-fine-tuned (SFT) on a 50k subset of
Jackrong/GLM-5.1-Reasoning-1M-Cleaned,
a cleaned reasoning-style chat corpus distilled from the GLM-5.1 family.
The goal is to bring some of the conversational reasoning behavior of larger GLM-5.1 teacher models into the small, efficient LFM2.5 architecture so it can run comfortably on a single consumer GPU, on edge devices, or via quantized runtimes such as ONNX, GGUF, or MLX.
Note: This is an independent community fine-tune. It is not affiliated with or endorsed by Liquid AI or Z.ai/THUDM (the GLM authors).
Model summary
| Property | Value |
|---|---|
| Architecture | LFM2 (hybrid conv + attention) |
| Parameters | ~1.2B |
| Tensor dtype | BF16 |
| Context length | 4096 (trained at 2048 with packing) |
| Base model | LiquidAI/LFM2.5-1.2B-Base |
| Fine-tuning method | LoRA SFT (merged back to base) |
| Trainer | Unsloth + TRL SFTTrainer |
| Chat template | LFM2 / ChatML-style (`< |
| License | Apache 2.0 |
Intended use
This model is designed for:
- General assistant-style chat
- Lightweight reasoning, step-by-step answers, and explanations
- On-device and edge deployments where a 1B class model is appropriate
- A starting checkpoint for further domain-specific fine-tuning
It is not a safety-aligned, production-ready assistant on its own. Treat its output as that of a small distilled student model: it can be confidently wrong, especially on long-horizon math, code correctness, current events, and anything safety-critical.
Out of scope
- Medical, legal, financial, or other high-stakes advice
- Any setting that requires guaranteed factuality
- Generating content that violates the Apache 2.0 license terms or the upstream LFM2.5 base model license
Quickstart (Transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "yasserrmd/glm5.1-distill"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain why the sky is blue in two short paragraphs."},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
return_dict=True,
).to(model.device)
streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05,
streamer=streamer,
)
Recommended sampling
The base LFM2.5 family is sensitive to sampling settings. The following defaults (inherited from Liquid AI's reference settings) work well:
| Use case | temperature | top_k | top_p | repetition_penalty |
|---|---|---|---|---|
| Factual / short answers | 0.1 | 50 | 0.1 | 1.05 |
| Creative / longer text | 0.7 | 50 | 0.9 | 1.10 |
| Code / structured output | 0.2 | 40 | 0.9 | 1.05 |
Chat template
The tokenizer ships with a ChatML-style template. A two-turn example serializes to:
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Hey there!<|im_end|>
Always use tokenizer.apply_chat_template(..., add_generation_prompt=True)
at inference time. Do not hand-roll the prompt.
Training details
Data
- Source:
Jackrong/GLM-5.1-Reasoning-1M-Cleaned,mainconfig - Slice: first 50,000 rows of the
trainsplit - Format: ShareGPT-style multi-turn conversations, normalized via
unsloth.chat_templates.standardize_data_formats - Loss masking:
train_on_responses_onlyso only assistant tokens contribute to the loss
LoRA configuration
| Hyperparameter | Value |
|---|---|
Rank r |
16 |
lora_alpha |
16 |
lora_dropout |
0 |
| Bias | none |
| Target modules | q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3 |
| Gradient checkpointing | unsloth |
| Random seed | 3407 |
SFT hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 1 |
| Per-device batch size | 32 |
| Gradient accumulation | 1 |
| Effective batch size | 32 |
| Packing | True |
| Max sequence length | 2048 |
| Optimizer | adamw_torch |
| Learning rate | 2e-5 |
| LR scheduler | linear |
| Warmup steps | 50 |
| Weight decay | 0.01 |
| Precision | BF16 |
| Seed | 3407 |
Merge & export
After SFT, the LoRA adapters were merged into the base weights using
Unsloth's push_to_hub_merged(..., save_method="merged_16bit"). The
repository contains the resulting full BF16 model, not adapters.
Hardware
Trained on a single GPU using Unsloth's optimized kernels. End-to-end training memory and time are dominated by the 50k-row, packed-2048 setup described above.
Evaluation
No formal benchmark scores are reported for this checkpoint yet. It has been smoke-tested on:
- General Q&A (e.g. "Why is the sky blue?")
- Short creative writing prompts
- Multi-turn instruction following
Quantitative evaluations on benchmarks such as MMLU, GSM8K, IFEval, or MT-Bench are left as future work. Contributions via the HF community tab are welcome.
Limitations and biases
- Inherits all limitations and biases of the LFM2.5 base model and of the GLM-5.1-derived training data.
- 1.2B parameters is small. Expect weaker performance than 7B+ chat models on hard reasoning, long context, and code generation.
- The training corpus is predominantly English. Other languages will work to varying degrees but are not the target.
- The model can hallucinate facts confidently. Verify anything important.
ONNX version
An ONNX export of this model is available at:
yasserrmd/glm5.1-distill-onnx
It can be used with onnxruntime and optimum for CPU and accelerated
inference. See that repository's README for usage details.
Citation
If you use this checkpoint, please cite the upstream work as well:
@misc{yasserrmd_glm51_distill_2026,
title = {glm5.1-distill: a small LFM2.5 student fine-tuned on GLM-5.1 reasoning data},
author = {Mohamed Yasser},
year = {2026},
howpublished = {\url{https://huggingface.co/yasserrmd/glm5.1-distill}}
}
And the base model and dataset:
- LiquidAI, LFM2.5-1.2B-Base, 2025.
- Jackrong, GLM-5.1-Reasoning-1M-Cleaned, Hugging Face Datasets.
Acknowledgements
- Liquid AI for the LFM2.5 base model.
- Jackrong for the cleaned GLM-5.1 reasoning dataset.
- Unsloth for the 2x faster SFT pipeline and memory-efficient LoRA kernels.
- Hugging Face TRL for
SFTTrainer.
- Downloads last month
- 362
