Breeze-ASR-25 — GGML format for whisper.cpp
GGML-quantized variants of MediaTek-Research/Breeze-ASR-25, ready to drop into whisper.cpp, VoiceInk, and any tool that consumes the standard ggml-*.bin whisper format.
Breeze-ASR-25 is a Whisper-large-v2 fine-tune by MediaTek Research, optimized for Taiwanese Mandarin and Mandarin–English code-switching (intra- and inter-sentential). On Taiwan-flavored mixed-language input it outperforms vanilla Whisper-large-v2 by a substantial margin while preserving Whisper's English ability.
Available variants
All variants pass a JFK sample sanity test (transcription matches Whisper-large-v2 baseline).
| File | Size | Quantization | When to pick |
|---|---|---|---|
ggml-breeze-asr-25-f16.bin |
2.9 GB | fp16 (no quant) | Baseline / quality reference |
ggml-breeze-asr-25-q8_0.bin ⭐ |
1.7 GB | 8-bit | Recommended sweet spot — near-zero WER loss vs fp16 |
ggml-breeze-asr-25-q6_k.bin |
1.3 GB | 6-bit K-quant | Between q8_0 and q5_k |
ggml-breeze-asr-25-q5_k.bin |
1.1 GB | 5-bit K-quant | Lower memory; K-quant beats q5_0 at same size |
ggml-breeze-asr-25-q5_0.bin |
1.1 GB | 5-bit legacy | Older quant; prefer q5_k unless you need the legacy format |
ggml-breeze-asr-25-q4_k.bin |
889 MB | 4-bit K-quant | Edge / low-RAM; best 4-bit quality |
ggml-breeze-asr-25-q4_0.bin |
889 MB | 4-bit legacy | Older quant; prefer q4_k |
ggml-breeze-asr-25-encoder.mlmodelc/ |
1.2 GB | Core ML | Apple Silicon ANE encoder — pair with any .bin above |
Naming note: whisper.cpp uses lowercase q4_k / q5_k, not llama.cpp's Q4_K_M / Q5_K_M. These are different ecosystems with different conventions.
Quick start
whisper.cpp
# Download a variant
hf download shdennlin/breeze-asr-25-ggml ggml-breeze-asr-25-q8_0.bin --local-dir ./models
# Transcribe
./build/bin/whisper-cli \
-m ./models/ggml-breeze-asr-25-q8_0.bin \
-f your-audio.wav \
-l auto
VoiceInk (macOS)
- Download
ggml-breeze-asr-25-q8_0.bin(or another variant) - Open VoiceInk → AI Models → Local tab → scroll to bottom → Import Local Model
- Select the
.binfile - (Optional) Also download
ggml-breeze-asr-25-encoder.mlmodelc/to the same directory for Apple Neural Engine acceleration (encoder runs 3–5x faster)
Core ML encoder pairing
To enable ANE acceleration on Apple Silicon, place the Core ML encoder alongside the .bin:
models/
├── ggml-breeze-asr-25-q8_0.bin
└── ggml-breeze-asr-25-encoder.mlmodelc/
whisper.cpp auto-detects the matching *-encoder.mlmodelc directory next to a .bin file. With ANE, encoder pass is ~3–5x faster than CPU.
Model details
- Base model: openai/whisper-large-v2 (1.55B parameters)
- Fine-tuned by: MediaTek Research
- Original HF repo: MediaTek-Research/Breeze-ASR-25
- Paper: Breeze ASR 25 / Twister (arXiv 2506.11130)
- Languages: Traditional Chinese (zh-TW), English, Mandarin-English code-switching
- Strengths: Taiwan-flavored Mandarin, intra-sentential code-switching, accurate timestamp alignment for captioning
- Architecture: Whisper-large-v2 encoder-decoder (32 layers each, n_audio_state=1280)
Conversion provenance
These GGML variants were converted from MediaTek-Research/Breeze-ASR-25's breeze-asr-25.pt checkpoint using whisper.cpp/models/convert-pt-to-ggml.py, then quantized with whisper-quantize. Verified on macOS arm64 with whisper.cpp built with Metal support. JFK sample transcription matches reference.
License
Apache 2.0 — inherited from the upstream Breeze-ASR-25 model and Whisper-large-v2 base. See LICENSE in the original repo.
Companion repo
For faster-whisper / CTranslate2 users (Python server, streaming via whisper-streaming, WhisperLiveKit), see the CT2 variants: shdennlin/breeze-asr-25-ct2.
Acknowledgments
Massive thanks to:
- MediaTek Research for releasing Breeze-ASR-25 under Apache 2.0
- ggerganov and the whisper.cpp community for the inference framework
- OpenAI for the original Whisper model
Model tree for shdennlin/breeze-asr-25-ggml
Base model
openai/whisper-large-v2