Breeze-ASR-25 — GGML format for whisper.cpp

GGML-quantized variants of MediaTek-Research/Breeze-ASR-25, ready to drop into whisper.cpp, VoiceInk, and any tool that consumes the standard ggml-*.bin whisper format.

Breeze-ASR-25 is a Whisper-large-v2 fine-tune by MediaTek Research, optimized for Taiwanese Mandarin and Mandarin–English code-switching (intra- and inter-sentential). On Taiwan-flavored mixed-language input it outperforms vanilla Whisper-large-v2 by a substantial margin while preserving Whisper's English ability.

Available variants

All variants pass a JFK sample sanity test (transcription matches Whisper-large-v2 baseline).

File	Size	Quantization	When to pick
`ggml-breeze-asr-25-f16.bin`	2.9 GB	fp16 (no quant)	Baseline / quality reference
`ggml-breeze-asr-25-q8_0.bin` ⭐	1.7 GB	8-bit	Recommended sweet spot — near-zero WER loss vs fp16
`ggml-breeze-asr-25-q6_k.bin`	1.3 GB	6-bit K-quant	Between q8_0 and q5_k
`ggml-breeze-asr-25-q5_k.bin`	1.1 GB	5-bit K-quant	Lower memory; K-quant beats q5_0 at same size
`ggml-breeze-asr-25-q5_0.bin`	1.1 GB	5-bit legacy	Older quant; prefer q5_k unless you need the legacy format
`ggml-breeze-asr-25-q4_k.bin`	889 MB	4-bit K-quant	Edge / low-RAM; best 4-bit quality
`ggml-breeze-asr-25-q4_0.bin`	889 MB	4-bit legacy	Older quant; prefer q4_k
`ggml-breeze-asr-25-encoder.mlmodelc/`	1.2 GB	Core ML	Apple Silicon ANE encoder — pair with any .bin above

Naming note: whisper.cpp uses lowercase q4_k / q5_k, not llama.cpp's Q4_K_M / Q5_K_M. These are different ecosystems with different conventions.

Quick start

whisper.cpp

# Download a variant
hf download shdennlin/breeze-asr-25-ggml ggml-breeze-asr-25-q8_0.bin --local-dir ./models

# Transcribe
./build/bin/whisper-cli \
  -m ./models/ggml-breeze-asr-25-q8_0.bin \
  -f your-audio.wav \
  -l auto

VoiceInk (macOS)

Download ggml-breeze-asr-25-q8_0.bin (or another variant)
Open VoiceInk → AI Models → Local tab → scroll to bottom → Import Local Model
Select the .bin file
(Optional) Also download ggml-breeze-asr-25-encoder.mlmodelc/ to the same directory for Apple Neural Engine acceleration (encoder runs 3–5x faster)

Core ML encoder pairing

To enable ANE acceleration on Apple Silicon, place the Core ML encoder alongside the .bin:

models/
├── ggml-breeze-asr-25-q8_0.bin
└── ggml-breeze-asr-25-encoder.mlmodelc/

whisper.cpp auto-detects the matching *-encoder.mlmodelc directory next to a .bin file. With ANE, encoder pass is ~3–5x faster than CPU.

Model details

Base model: openai/whisper-large-v2 (1.55B parameters)
Fine-tuned by: MediaTek Research
Original HF repo: MediaTek-Research/Breeze-ASR-25
Paper: Breeze ASR 25 / Twister (arXiv 2506.11130)
Languages: Traditional Chinese (zh-TW), English, Mandarin-English code-switching
Strengths: Taiwan-flavored Mandarin, intra-sentential code-switching, accurate timestamp alignment for captioning
Architecture: Whisper-large-v2 encoder-decoder (32 layers each, n_audio_state=1280)

Conversion provenance

These GGML variants were converted from MediaTek-Research/Breeze-ASR-25's breeze-asr-25.pt checkpoint using whisper.cpp/models/convert-pt-to-ggml.py, then quantized with whisper-quantize. Verified on macOS arm64 with whisper.cpp built with Metal support. JFK sample transcription matches reference.

License

Apache 2.0 — inherited from the upstream Breeze-ASR-25 model and Whisper-large-v2 base. See LICENSE in the original repo.

Companion repo

For faster-whisper / CTranslate2 users (Python server, streaming via whisper-streaming, WhisperLiveKit), see the CT2 variants: shdennlin/breeze-asr-25-ct2.

Acknowledgments

Massive thanks to:

MediaTek Research for releasing Breeze-ASR-25 under Apache 2.0
ggerganov and the whisper.cpp community for the inference framework
OpenAI for the original Whisper model

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for shdennlin/breeze-asr-25-ggml

Base model

openai/whisper-large-v2

Finetuned

MediaTek-Research/Breeze-ASR-25

Quantized

(7)

this model

Paper for shdennlin/breeze-asr-25-ggml

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Paper • 2506.11130 • Published Jun 10, 2025 • 5