Breeze-ASR-25 — GGML format for whisper.cpp

GGML-quantized variants of MediaTek-Research/Breeze-ASR-25, ready to drop into whisper.cpp, VoiceInk, and any tool that consumes the standard ggml-*.bin whisper format.

Breeze-ASR-25 is a Whisper-large-v2 fine-tune by MediaTek Research, optimized for Taiwanese Mandarin and Mandarin–English code-switching (intra- and inter-sentential). On Taiwan-flavored mixed-language input it outperforms vanilla Whisper-large-v2 by a substantial margin while preserving Whisper's English ability.

Available variants

All variants pass a JFK sample sanity test (transcription matches Whisper-large-v2 baseline).

File Size Quantization When to pick
ggml-breeze-asr-25-f16.bin 2.9 GB fp16 (no quant) Baseline / quality reference
ggml-breeze-asr-25-q8_0.bin 1.7 GB 8-bit Recommended sweet spot — near-zero WER loss vs fp16
ggml-breeze-asr-25-q6_k.bin 1.3 GB 6-bit K-quant Between q8_0 and q5_k
ggml-breeze-asr-25-q5_k.bin 1.1 GB 5-bit K-quant Lower memory; K-quant beats q5_0 at same size
ggml-breeze-asr-25-q5_0.bin 1.1 GB 5-bit legacy Older quant; prefer q5_k unless you need the legacy format
ggml-breeze-asr-25-q4_k.bin 889 MB 4-bit K-quant Edge / low-RAM; best 4-bit quality
ggml-breeze-asr-25-q4_0.bin 889 MB 4-bit legacy Older quant; prefer q4_k
ggml-breeze-asr-25-encoder.mlmodelc/ 1.2 GB Core ML Apple Silicon ANE encoder — pair with any .bin above

Naming note: whisper.cpp uses lowercase q4_k / q5_k, not llama.cpp's Q4_K_M / Q5_K_M. These are different ecosystems with different conventions.

Quick start

whisper.cpp

# Download a variant
hf download shdennlin/breeze-asr-25-ggml ggml-breeze-asr-25-q8_0.bin --local-dir ./models

# Transcribe
./build/bin/whisper-cli \
  -m ./models/ggml-breeze-asr-25-q8_0.bin \
  -f your-audio.wav \
  -l auto

VoiceInk (macOS)

  1. Download ggml-breeze-asr-25-q8_0.bin (or another variant)
  2. Open VoiceInk → AI ModelsLocal tab → scroll to bottom → Import Local Model
  3. Select the .bin file
  4. (Optional) Also download ggml-breeze-asr-25-encoder.mlmodelc/ to the same directory for Apple Neural Engine acceleration (encoder runs 3–5x faster)

Core ML encoder pairing

To enable ANE acceleration on Apple Silicon, place the Core ML encoder alongside the .bin:

models/
├── ggml-breeze-asr-25-q8_0.bin
└── ggml-breeze-asr-25-encoder.mlmodelc/

whisper.cpp auto-detects the matching *-encoder.mlmodelc directory next to a .bin file. With ANE, encoder pass is ~3–5x faster than CPU.

Model details

Conversion provenance

These GGML variants were converted from MediaTek-Research/Breeze-ASR-25's breeze-asr-25.pt checkpoint using whisper.cpp/models/convert-pt-to-ggml.py, then quantized with whisper-quantize. Verified on macOS arm64 with whisper.cpp built with Metal support. JFK sample transcription matches reference.

License

Apache 2.0 — inherited from the upstream Breeze-ASR-25 model and Whisper-large-v2 base. See LICENSE in the original repo.

Companion repo

For faster-whisper / CTranslate2 users (Python server, streaming via whisper-streaming, WhisperLiveKit), see the CT2 variants: shdennlin/breeze-asr-25-ct2.

Acknowledgments

Massive thanks to:

  • MediaTek Research for releasing Breeze-ASR-25 under Apache 2.0
  • ggerganov and the whisper.cpp community for the inference framework
  • OpenAI for the original Whisper model
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shdennlin/breeze-asr-25-ggml

Quantized
(7)
this model

Paper for shdennlin/breeze-asr-25-ggml