🎙️ Whisper Small - French MLS (Fine-Tuned)
This model is a fine-tuned version of openai/whisper-small specifically optimized for the French language. It was trained on the French subset of theMultilingual LibriSpeech (MLS) dataset.
By using Low-Rank Adaptation (LoRA) and carefully finding the optimal training sweet spot (90k examples), this model achieves a 17% relative error reduction compared to the base Whisper Small model on the French MLS test set.
🚀 Usage
Since the LoRA adapter weights have been fully merged into the base model, you can use this model out-of-the-box with the standard transformers pipeline. No peft library is required!
from transformers import pipeline
# Load the fine-tuned model
transcriber = pipeline("automatic-speech-recognition", model="keypa/whisper-small-fr-mls")
# Transcribe your audio
result = transcriber("path_to_your_audio_file.wav")
print(result["text"])
Pro-tip: If you are transcribing long audio files (over 30 seconds), add chunk_length_s=30 to the pipeline parameters.
📊 Benchmarks & Performance
An extensive ablation study was conducted during training to find the optimal amount of data to prevent overfitting while maximizing generalization. The models were evaluated on the MLS French test split (2.43k unseen audio files).
| Model Version | Training Examples Seen | Word Error Rate (WER) ↓ |
|---|---|---|
| keypa/whisper-small-fr-mls (This Model) | 90,000 | 10.33 % 🏆 |
| Checkpoint 120k | 120,000 | 10.56 % |
| Checkpoint FULL | 258,000 | 10.64 % |
| Checkpoint 30k | 30,000 | 10.91 % |
| openai/whisper-small (Base) | 0 (Zero-Shot) | 12.42 % |
Key Takeaway: The optimal generalization point was reached at 90k steps. Training further (up to 258k) resulted in a slight degradation of performance on the test set, demonstrating the importance of checkpoint benchmarking.
⚠️ Note on Previous Versions (Disambiguation)
Please do not confuse this model with older experimental iterations such as keypa/whisper-3-mls-fr.
- The old
whisper-3-mls-frwas an early experimental run trained on a very small subset (~10k examples). While it showed an artificially low training loss, it suffered from severe overfitting and performed poorly on unseen data (WER > 12%). - This repository (
whisper-small-fr-mls) is the final, production-ready model that generalizes properly across diverse French voices.
⚙️ Training Details
The model was trained on a single NVIDIA Tesla T4 (16GB) using the following techniques:
- Architecture: PEFT / LoRA ( , ) targeting
q_projandv_projmodules. - Data Pipeline: Streaming mode with pre-computed Mel spectrograms to bypass CPU bottleneck.
- Optimization: 8-bit/16-bit mixed precision, Effective Batch Size of 32.
- Learning Rate Schedule: Dynamic decay from based on loss stabilization.
- Merge: The final adapter was merged into the base weights via
merge_and_unload().
Note for Researchers & Developers:
All intermediate LoRA adapters (from 30k to 258k steps) have been preserved and uploaded in the adapters/ folder of this repository. If you are interested in researching catastrophic forgetting, style dilution (e.g., number formatting behavior in early vs late checkpoints), or reproducing the ablation study, you can easily load them using PeftModel.from_pretrained().
- Downloads last month
- -
Dataset used to train keypa/whisper-small-fr-mls
Evaluation results
- Word Error Rate (WER) on Multilingual LibriSpeech (French Test Split)test set self-reported10.330