Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
inference throughput drops by 80% with this template
#28
by froilo - opened
inference throughput drops by 80% with this template
/llama_mtp/llama.cpp/build/bin/llama-server
-m /models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/Qwen3.6-35B-A3B-UD-Q4_K_S.gguf
--host 0.0.0.0
--port 8080
-c 32768
--cache-type-k q8_0
--cache-type-v q8_0
--temp 0.7
--top-p 0.95
--top-k 20
--presence-penalty 0.0
--min-p 0.00
--spec-type draft-mtp
--spec-draft-n-max 6
--spec-draft-p-min 0.75
--jinja
--chat-template-file /models/unsloth/Qwen3.6-35B-A3B-MTP-GGUF/froggeric__Qwen-Fixed-Chat-Templates.jinja
Try these params:--chat-template-kwargs "{\"preserve_thinking\":true}" --spec-type draft-mtp --spec-draft-n-max 2.
spec-draft-n-max 6 - that's not optimal for this model.
Try these params:
--chat-template-kwargs "{\"preserve_thinking\":true}" --spec-type draft-mtp --spec-draft-n-max 2.
spec-draft-n-max 6- that's not optimal for this model.
thx it works
it may be even about 8-10% faster than my prior MTP setup without the template (not enough data though)
also havent tested agentic flow yet
thats why ive been using --spec-draft-n-max 6
Your quant had the best perf between 1-2.
Edit: srry, unsloth suggested that value somewhere else
