model support per CrispASR — pure C++ inference with GGUF quantisation

#40

by cstr - opened Apr 16

•

We've built a complete C++ runtime for Voxtral-Mini-4B-Realtime in CrispASR, using ggml for inference. One binary, one GGUF file — no Python, no PyTorch.

What works:

Decent Performance (e.g. 3.8x faster than voxtral.c on Intel Xeon 4-core, no GPU)
Full transcription (causal RoPE encoder + 3.4B LLM with streaming audio injection)
GGUF quantisation — Q4_K shrinks the model from 8.3 GB to ~2.5 GB
Temperature sampling + best-of-N
Streaming from mic/stdin (--stream, --mic)
GPU acceleration via CUDA / Metal / Vulkan
Word timestamps via forced alignment (-am qwen3-forced-aligner.gguf)
Speaker diarisation, language ID, all output formats

Quick start:

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8
./build/bin/crispasr --backend voxtral4b -m auto -f audio.wav

Pre-quantised GGUFs: cstr/voxtral-mini-4b-realtime-GGUF

CrispASR supports 11 ASR backends total — pick the right one for your use case with a single --backend flag.

cstr changed discussion title from CrispASR — pure C++ inference with GGUF quantisation, 3.8x faster than voxtral.c on CPU to CrispASR — pure C++ inference with GGUF quantisation Apr 16

cstr changed discussion title from CrispASR — pure C++ inference with GGUF quantisation to model support per CrispASR — pure C++ inference with GGUF quantisation Apr 16

liuyt6515

6 days ago

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8
./build/bin/crispasr --backend voxtral4b -m auto -f audio.wav

thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment