SenseVoice.cpp Jetson Nano Binaries

SenseVoice.cpp is a high-performance, open-source C++ speech-to-text implementation aimed at edge devices. It leverages the GGML inference framework and supports multiple backends, including CUDA for GPU acceleration.

This repository hosts prebuilt binaries optimized for NVIDIA Jetson Nano, so you can skip the build step and start transcribing right away.

Original project: https://github.com/lovemefan/SenseVoice.cpp

✨ Key Features

Multi-language ASR: Supports Chinese (Mandarin), Cantonese, English, Japanese, and Korean.
Low latency: Efficient inference with optional flash-attn.
Quantization: Q3, Q4, Q5, Q6, Q8 quantized models to reduce memory footprint.
Flexible backends:
- CPU (all platforms)
- CUDA (NVIDIA GPUs)
- BLAS, Metal, Vulkan (upstream)
Voice Activity Detection (VAD): Built-in silence-based VAD parameters.
Inverse Text Normalization (ITN): Optionally output punctuation and formatted text.

For full feature details (streaming mode, extra backends), see the upstream documentation.

📁 Deliverable Directory Structure

project-root/
├── bin/                     # Executables
│   ├── sense-voice-main     # Main ASR program
│   ├── sense-voice-quantize # Model quantization utility
│   └── sense-voice-zcr-main # Zero-Crossing Rate detection example
└── lib/                     # Libraries
    ├── libcommon.a          # Common static library
    ├── libggml-base.so      # GGML base operations
    ├── libggml-cpu.so       # GGML CPU support
    ├── libggml-cuda.so      # GGML CUDA support
    ├── libggml.so           # GGML core
    └── libsense-voice-core.a# SenseVoice core

bin/: Standalone executables for Jetson Nano.
lib/: Static (.a) and shared (.so) libraries required at runtime.

🚀 Quick Deployment

Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano):

1. Clone the Repo with Git LFS Support

If you haven’t installed Git LFS yet, do so and initialize:

# Install Git LFS
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
# Initialize in your repo
git lfs install

Clone the repository:

git clone https://huggingface.co/<YOUR_USERNAME>/sensevoice-jetson-nano.git
cd sensevoice-jetson-nano
git lfs pull

2. Track Large Binary Files with Git LFS

Ensure large files (shared libraries) use LFS to avoid push errors:

git lfs track "lib/*.so"
git add .gitattributes

3. Uploading New Binaries

When you update or add new .so files in lib/, commit and push as usual:

git add lib/*.so
git commit -m "Add updated shared libraries via LFS"
git push

4. Make Binaries Executable

chmod +x bin/*

5. Install Shared Libraries System-wide

sudo mkdir -p /usr/local/lib/sensevoice
sudo cp lib/*.so /usr/local/lib/sensevoice/
echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf
sudo ldconfig

Alternatively, set LD_LIBRARY_PATH locally:

export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH"

6. Model Setup

Download or convert a GGUF model (e.g., sense-voice-small-q4_k.gguf):

# From Hugging Face
git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models

7. Run Examples

Speech-to-Text (non-streaming)

bin/sense-voice-main \
  -m models/sense-voice-small-q4_k.gguf \
  -f input.wav \
  -t 4 \
  -l zh \
  --use-itn \
  --flash-attn

Options:

-t N / --threads N: Number of decode threads (default: 4)
-l LANG / --language LANG: auto, zh, en, yue, ja, ko
--min_speech_duration_ms, --max_speech_duration_ms: VAD thresholds
--no-gpu (-ng): Disable GPU
--use-itn (-itn): Enable inverse text normalization
--flash-attn (-fa): Enable Flash Attention decoder

Quantization Utility

bin/sense-voice-quantize \
  --input models/sense-voice-small.bin \
  --output models/sense-voice-small-q4_k.gguf \
  --type q4_k

Supported quant types: q3, q4_k, q4_0, q5_0, q6_k, q8.

Zero-Crossing Rate Demo

bin/sense-voice-zcr-main input.wav

Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano):

1. Clone the Repo

git lfs install
git clone https://huggingface.co/<YOUR_USERNAME>/sensevoice-jetson-nano.git
cd sensevoice-jetson-nano
git pull

2. Make Binaries Executable

chmod +x bin/*

3. Install Shared Libraries System-wide

sudo mkdir -p /usr/local/lib/sensevoice
sudo cp lib/*.so /usr/local/lib/sensevoice/
echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf
sudo ldconfig

Alternatively, set LD_LIBRARY_PATH locally:

export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH"

4. Model Setup

Download or convert a GGUF model (e.g., sense-voice-small-q4_k.gguf):

# From Hugging Face
git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models

5. Run Examples

Speech-to-Text (non-streaming)

bin/sense-voice-main \
  -m models/sense-voice-small-q4_k.gguf \
  -f input.wav \
  -t 4 \
  -l zh \
  --use-itn \
  --flash-attn

Options:

-t N / --threads N: Number of decode threads (default: 4)
-l LANG / --language LANG: auto, zh, en, yue, ja, ko
--min_speech_duration_ms, --max_speech_duration_ms: VAD thresholds
--no-gpu (-ng): Disable GPU
--use-itn (-itn): Enable inverse text normalization
--flash-attn (-fa): Enable Flash Attention decoder

Quantization Utility

bin/sense-voice-quantize \
  --input models/sense-voice-small.bin \
  --output models/sense-voice-small-q4_k.gguf \
  --type q4_k

Supported quant types: q3, q4_k, q4_0, q5_0, q6_k, q8.

Zero-Crossing Rate Demo

bin/sense-voice-zcr-main input.wav

For streaming ASR or advanced examples, please refer to upstream's sense-voice-stream in the original repo.

🛠 Compatibility

Hardware: NVIDIA Jetson Nano
OS: Ubuntu 18.04 / JetPack 4.5.1
CUDA: 10.2
C++: C++17

📜 License

MIT License — see LICENSE for details.

For comprehensive build instructions, extra examples, and advanced backend support, visit the official SenseVoice.cpp documentation. Happy prototyping! 🎙️💕

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support