🎙️ Whisper Small - French MLS (Fine-Tuned)

This model is a fine-tuned version of openai/whisper-small specifically optimized for the French language. It was trained on the French subset of theMultilingual LibriSpeech (MLS) dataset.

By using Low-Rank Adaptation (LoRA) and carefully finding the optimal training sweet spot (90k examples), this model achieves a 17% relative error reduction compared to the base Whisper Small model on the French MLS test set.

🚀 Usage

Since the LoRA adapter weights have been fully merged into the base model, you can use this model out-of-the-box with the standard transformers pipeline. No peft library is required!

from transformers import pipeline

# Load the fine-tuned model
transcriber = pipeline("automatic-speech-recognition", model="keypa/whisper-small-fr-mls")

# Transcribe your audio
result = transcriber("path_to_your_audio_file.wav")
print(result["text"])

Pro-tip: If you are transcribing long audio files (over 30 seconds), add chunk_length_s=30 to the pipeline parameters.

📊 Benchmarks & Performance

An extensive ablation study was conducted during training to find the optimal amount of data to prevent overfitting while maximizing generalization. The models were evaluated on the MLS French test split (2.43k unseen audio files).

Model Version	Training Examples Seen	Word Error Rate (WER) ↓
keypa/whisper-small-fr-mls (This Model)	90,000	10.33 % 🏆
Checkpoint 120k	120,000	10.56 %
Checkpoint FULL	258,000	10.64 %
Checkpoint 30k	30,000	10.91 %
openai/whisper-small (Base)	0 (Zero-Shot)	12.42 %

Key Takeaway: The optimal generalization point was reached at 90k steps. Training further (up to 258k) resulted in a slight degradation of performance on the test set, demonstrating the importance of checkpoint benchmarking.

⚠️ Note on Previous Versions (Disambiguation)

Please do not confuse this model with older experimental iterations such as keypa/whisper-3-mls-fr.

The old whisper-3-mls-fr was an early experimental run trained on a very small subset (~10k examples). While it showed an artificially low training loss, it suffered from severe overfitting and performed poorly on unseen data (WER > 12%).
This repository (whisper-small-fr-mls) is the final, production-ready model that generalizes properly across diverse French voices.

⚙️ Training Details

The model was trained on a single NVIDIA Tesla T4 (16GB) using the following techniques:

Architecture: PEFT / LoRA ( $r = 32$ , $\alpha=32$ ) targeting q_proj and v_proj modules.
Data Pipeline: Streaming mode with pre-computed Mel spectrograms to bypass CPU bottleneck.
Optimization: 8-bit/16-bit mixed precision, Effective Batch Size of 32.
Learning Rate Schedule: Dynamic decay from $1e^{-3} \rightarrow 5e^{-4} \rightarrow 1e^{-4}$ based on loss stabilization.
Merge: The final adapter was merged into the base weights via merge_and_unload().

Note for Researchers & Developers:

All intermediate LoRA adapters (from 30k to 258k steps) have been preserved and uploaded in the adapters/ folder of this repository. If you are interested in researching catastrophic forgetting, style dilution (e.g., number formatting behavior in early vs late checkpoints), or reproducing the ablation study, you can easily load them using PeftModel.from_pretrained().

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F16

Dataset used to train keypa/whisper-small-fr-mls

Evaluation results

Word Error Rate (WER) on Multilingual LibriSpeech (French Test Split)
test set self-reported

10.330