Distil-Whisper: Optimized for Qualcomm Devices
Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.
This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.
Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.
Getting Started
There are two ways to deploy this model on your device:
Option 1: Download Pre-Exported Models
Below are pre-exported model assets ready for deployment.
| Runtime | Precision | Chipset | SDK Versions | Download |
|---|---|---|---|---|
| ONNX | float | Universal | QAIRT 2.45, ONNX Runtime 1.25.0 | Download |
| QNN_DLC | float | Universal | QAIRT 2.45 | Download |
| TFLITE | float | Universal | Download |
For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.
Option 2: Export with Custom Configurations
Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:
- Custom weights (e.g., fine-tuned checkpoints)
- Custom input shapes
- Target device and runtime configurations
This option is ideal if you need to customize the model beyond the default configuration provided here.
See our repository for Distil-Whisper on GitHub for usage instructions.
Model Details
Model Type: Model_use_case.speech_recognition
Model Stats:
- Model checkpoint: distil-whisper/distil-small.en
- Input resolution: 80x3000 (30 seconds audio)
- Max decoded sequence length: 200 tokens
- Number of parameters (encoder): 166M
- Model size (encoder) (float): 332 MB
- Number of parameters (decoder): 211M
- Model size (decoder) (float): 450MB
Performance Summary
| Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit |
|---|---|---|---|---|---|---|
| decoder | ONNX | float | Snapdragon® X2 Elite | 5.295 ms | 172 - 172 MB | NPU |
| decoder | ONNX | float | Snapdragon® X Elite | 10.821 ms | 211 - 211 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 8.837 ms | 0 - 475 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Gen 1 Mobile | 18.106 ms | 50 - 384 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS8550 (Proxy) | 11.846 ms | 0 - 183 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS8450 | 18.106 ms | 50 - 384 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Elite Mobile | 7.28 ms | 11 - 563 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.746 ms | 15 - 551 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS9075 | 17.928 ms | 40 - 85 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS8750 | 7.28 ms | 11 - 563 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS7181 | 10.821 ms | 211 - 211 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® X2 Elite | 5.973 ms | 40 - 40 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® X Elite | 11.522 ms | 40 - 40 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 8.644 ms | 40 - 644 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Gen 1 Mobile | 18.094 ms | 1 - 303 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8275 | 19.063 ms | 29 - 526 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 11.567 ms | 40 - 42 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8450 | 18.094 ms | 1 - 303 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Elite Mobile | 7.32 ms | 0 - 542 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® SA8295P | 14.026 ms | 2 - 244 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.936 ms | 1 - 503 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® SA7255P | 19.063 ms | 29 - 526 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS9075 | 16.568 ms | 40 - 86 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8750 | 7.32 ms | 0 - 542 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS7181 | 11.522 ms | 40 - 40 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 8.607 ms | 4 - 744 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Gen 1 Mobile | 18.406 ms | 5 - 467 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8275 | 19.145 ms | 4 - 537 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 11.608 ms | 5 - 8 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA8775P | 17.183 ms | 41 - 52 MB | CPU |
| decoder | TFLITE | float | Qualcomm® SA8650P | 17.183 ms | 41 - 52 MB | CPU |
| decoder | TFLITE | float | Qualcomm® SA8255P | 17.183 ms | 41 - 52 MB | CPU |
| decoder | TFLITE | float | Qualcomm® QCS8450 | 18.406 ms | 5 - 467 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Elite Mobile | 7.222 ms | 5 - 573 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA8295P | 13.884 ms | 5 - 297 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.839 ms | 4 - 574 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA7255P | 19.145 ms | 4 - 537 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS9075 | 16.364 ms | 0 - 265 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8750 | 7.222 ms | 5 - 573 MB | NPU |
| encoder | ONNX | float | Snapdragon® X2 Elite | 60.047 ms | 211 - 211 MB | NPU |
| encoder | ONNX | float | Snapdragon® X Elite | 132.661 ms | 236 - 236 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 97.408 ms | 127 - 1191 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Gen 1 Mobile | 271.563 ms | 75 - 1001 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS8550 (Proxy) | 129.944 ms | 0 - 260 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS8450 | 271.563 ms | 75 - 1001 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Elite Mobile | 71.361 ms | 81 - 784 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 57.981 ms | 53 - 778 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS9075 | 169.233 ms | 78 - 124 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS8750 | 71.361 ms | 81 - 784 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS7181 | 132.661 ms | 236 - 236 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® X2 Elite | 60.536 ms | 1 - 1 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® X Elite | 139.974 ms | 1 - 1 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 96.914 ms | 0 - 964 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Gen 1 Mobile | 266.25 ms | 1 - 824 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8275 | 437.854 ms | 1 - 695 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 135.358 ms | 1 - 898 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8450 | 266.25 ms | 1 - 824 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Elite Mobile | 71.483 ms | 1 - 691 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® SA8295P | 193.074 ms | 1 - 611 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 57.723 ms | 0 - 718 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® SA7255P | 437.854 ms | 1 - 695 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS9075 | 170.907 ms | 1 - 39 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8750 | 71.483 ms | 1 - 691 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS7181 | 139.974 ms | 1 - 1 MB | NPU |
| encoder | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 479.703 ms | 0 - 144 MB | GPU |
| encoder | TFLITE | float | Snapdragon® 8 Gen 1 Mobile | 841.152 ms | 41 - 195 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8275 | 3109.319 ms | 24 - 69 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 652.021 ms | 0 - 291 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8775P | 1310.069 ms | 15 - 59 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8650P | 1310.069 ms | 15 - 59 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8255P | 1310.069 ms | 15 - 59 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8450 | 841.152 ms | 41 - 195 MB | GPU |
| encoder | TFLITE | float | Snapdragon® 8 Elite Mobile | 403.588 ms | 41 - 81 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8295P | 667.003 ms | 40 - 84 MB | GPU |
| encoder | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 370.741 ms | 40 - 79 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA7255P | 3109.319 ms | 24 - 69 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS9075 | 1264.937 ms | 0 - 40 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8750 | 403.588 ms | 41 - 81 MB | GPU |
License
- The license for the original implementation of Distil-Whisper can be found here.
References
- Distil-Whisper - Robust Knowledge Distillation via Large-Scale Pseudo Labelling
- Source Model Implementation
Community
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.
