Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results (legacy)
Instructions to use openai/whisper-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-base")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-base") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-base") - Notebooks
- Google Colab
- Kaggle
The size of tensor a (449) must match the size of tensor b (448) at non-singleton dimension 1
#18
by Mr1gh - opened
for wav more than 6 s this problem occurs, when I search I get that "Whisper decoder uses a learned position embedding which has the max length of 448 tokens. Therefore it cannot decode any transcription of more than 448 label ids." is that mean that whisper can be trained on only fixed max length of tokens, and it can't be changed?
hi did you find any answer?