Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.45, ONNX Runtime 1.25.0 Download
QNN_DLC float Universal QAIRT 2.45 Download
TFLITE float Universal Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

  • Model checkpoint: distil-whisper/distil-small.en
  • Input resolution: 80x3000 (30 seconds audio)
  • Max decoded sequence length: 200 tokens
  • Number of parameters (encoder): 166M
  • Model size (encoder) (float): 332 MB
  • Number of parameters (decoder): 211M
  • Model size (decoder) (float): 450MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
decoder ONNX float Snapdragon® X2 Elite 5.295 ms 172 - 172 MB NPU
decoder ONNX float Snapdragon® X Elite 10.821 ms 211 - 211 MB NPU
decoder ONNX float Snapdragon® 8 Gen 3 Mobile 8.837 ms 0 - 475 MB NPU
decoder ONNX float Snapdragon® 8 Gen 1 Mobile 18.106 ms 50 - 384 MB NPU
decoder ONNX float Qualcomm® QCS8550 (Proxy) 11.846 ms 0 - 183 MB NPU
decoder ONNX float Qualcomm® QCS8450 18.106 ms 50 - 384 MB NPU
decoder ONNX float Snapdragon® 8 Elite Mobile 7.28 ms 11 - 563 MB NPU
decoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 5.746 ms 15 - 551 MB NPU
decoder ONNX float Qualcomm® QCS9075 17.928 ms 40 - 85 MB NPU
decoder ONNX float Qualcomm® QCS8750 7.28 ms 11 - 563 MB NPU
decoder ONNX float Qualcomm® QCS7181 10.821 ms 211 - 211 MB NPU
decoder QNN_DLC float Snapdragon® X2 Elite 5.973 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 11.522 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 8.644 ms 40 - 644 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 1 Mobile 18.094 ms 1 - 303 MB NPU
decoder QNN_DLC float Qualcomm® QCS8275 19.063 ms 29 - 526 MB NPU
decoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 11.567 ms 40 - 42 MB NPU
decoder QNN_DLC float Qualcomm® QCS8450 18.094 ms 1 - 303 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Mobile 7.32 ms 0 - 542 MB NPU
decoder QNN_DLC float Qualcomm® SA8295P 14.026 ms 2 - 244 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 5.936 ms 1 - 503 MB NPU
decoder QNN_DLC float Qualcomm® SA7255P 19.063 ms 29 - 526 MB NPU
decoder QNN_DLC float Qualcomm® QCS9075 16.568 ms 40 - 86 MB NPU
decoder QNN_DLC float Qualcomm® QCS8750 7.32 ms 0 - 542 MB NPU
decoder QNN_DLC float Qualcomm® QCS7181 11.522 ms 40 - 40 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 3 Mobile 8.607 ms 4 - 744 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 1 Mobile 18.406 ms 5 - 467 MB NPU
decoder TFLITE float Qualcomm® QCS8275 19.145 ms 4 - 537 MB NPU
decoder TFLITE float Qualcomm® QCS8550 (Proxy) 11.608 ms 5 - 8 MB NPU
decoder TFLITE float Qualcomm® SA8775P 17.183 ms 41 - 52 MB CPU
decoder TFLITE float Qualcomm® SA8650P 17.183 ms 41 - 52 MB CPU
decoder TFLITE float Qualcomm® SA8255P 17.183 ms 41 - 52 MB CPU
decoder TFLITE float Qualcomm® QCS8450 18.406 ms 5 - 467 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Mobile 7.222 ms 5 - 573 MB NPU
decoder TFLITE float Qualcomm® SA8295P 13.884 ms 5 - 297 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 5.839 ms 4 - 574 MB NPU
decoder TFLITE float Qualcomm® SA7255P 19.145 ms 4 - 537 MB NPU
decoder TFLITE float Qualcomm® QCS9075 16.364 ms 0 - 265 MB NPU
decoder TFLITE float Qualcomm® QCS8750 7.222 ms 5 - 573 MB NPU
encoder ONNX float Snapdragon® X2 Elite 60.047 ms 211 - 211 MB NPU
encoder ONNX float Snapdragon® X Elite 132.661 ms 236 - 236 MB NPU
encoder ONNX float Snapdragon® 8 Gen 3 Mobile 97.408 ms 127 - 1191 MB NPU
encoder ONNX float Snapdragon® 8 Gen 1 Mobile 271.563 ms 75 - 1001 MB NPU
encoder ONNX float Qualcomm® QCS8550 (Proxy) 129.944 ms 0 - 260 MB NPU
encoder ONNX float Qualcomm® QCS8450 271.563 ms 75 - 1001 MB NPU
encoder ONNX float Snapdragon® 8 Elite Mobile 71.361 ms 81 - 784 MB NPU
encoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 57.981 ms 53 - 778 MB NPU
encoder ONNX float Qualcomm® QCS9075 169.233 ms 78 - 124 MB NPU
encoder ONNX float Qualcomm® QCS8750 71.361 ms 81 - 784 MB NPU
encoder ONNX float Qualcomm® QCS7181 132.661 ms 236 - 236 MB NPU
encoder QNN_DLC float Snapdragon® X2 Elite 60.536 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.974 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 96.914 ms 0 - 964 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 1 Mobile 266.25 ms 1 - 824 MB NPU
encoder QNN_DLC float Qualcomm® QCS8275 437.854 ms 1 - 695 MB NPU
encoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 135.358 ms 1 - 898 MB NPU
encoder QNN_DLC float Qualcomm® QCS8450 266.25 ms 1 - 824 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Mobile 71.483 ms 1 - 691 MB NPU
encoder QNN_DLC float Qualcomm® SA8295P 193.074 ms 1 - 611 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 57.723 ms 0 - 718 MB NPU
encoder QNN_DLC float Qualcomm® SA7255P 437.854 ms 1 - 695 MB NPU
encoder QNN_DLC float Qualcomm® QCS9075 170.907 ms 1 - 39 MB NPU
encoder QNN_DLC float Qualcomm® QCS8750 71.483 ms 1 - 691 MB NPU
encoder QNN_DLC float Qualcomm® QCS7181 139.974 ms 1 - 1 MB NPU
encoder TFLITE float Snapdragon® 8 Gen 3 Mobile 479.703 ms 0 - 144 MB GPU
encoder TFLITE float Snapdragon® 8 Gen 1 Mobile 841.152 ms 41 - 195 MB GPU
encoder TFLITE float Qualcomm® QCS8275 3109.319 ms 24 - 69 MB GPU
encoder TFLITE float Qualcomm® QCS8550 (Proxy) 652.021 ms 0 - 291 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1310.069 ms 15 - 59 MB GPU
encoder TFLITE float Qualcomm® SA8650P 1310.069 ms 15 - 59 MB GPU
encoder TFLITE float Qualcomm® SA8255P 1310.069 ms 15 - 59 MB GPU
encoder TFLITE float Qualcomm® QCS8450 841.152 ms 41 - 195 MB GPU
encoder TFLITE float Snapdragon® 8 Elite Mobile 403.588 ms 41 - 81 MB GPU
encoder TFLITE float Qualcomm® SA8295P 667.003 ms 40 - 84 MB GPU
encoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 370.741 ms 40 - 79 MB GPU
encoder TFLITE float Qualcomm® SA7255P 3109.319 ms 24 - 69 MB GPU
encoder TFLITE float Qualcomm® QCS9075 1264.937 ms 0 - 40 MB GPU
encoder TFLITE float Qualcomm® QCS8750 403.588 ms 41 - 81 MB GPU

License

  • The license for the original implementation of Distil-Whisper can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Distil-Whisper