daniel_whisper_finetune_large_v3_turbo_v2

This model is a fine-tuned version of openai/whisper-large-v3-turbo on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2212

Model description

This is a personal fine-tune of the Whisper large-v3-turbo model, trained on approximately 1 hour of audio featuring Daniel Rosehill's voice. The training data includes domain-specific vocabulary focused on:

Technology and software development terminology
A few Hebrew words and phrases

This model was created as a proof of concept for fine-tuning Whisper models for personal use and improved transcription accuracy on domain-specific content.

Training Infrastructure

Fine-tuning was performed using Modal GPU inference infrastructure.

Converted Formats

In addition to the standard SafeTensors format, this repository includes converted model formats in the converted/ directory:

GGML format (converted/ggml/): For use with whisper.cpp
- Cross-platform inference (desktop, mobile, edge devices)
- Optimized for CPU and CUDA (NVIDIA GPU) acceleration
- Compatible with iOS, Android, Raspberry Pi, and other platforms
CTranslate2 format (converted/ctranslate2/): For use with faster-whisper
- Highly optimized inference engine (4x faster than OpenAI Whisper)
- Excellent CPU and GPU (CUDA) support
- Lower memory usage with 8-bit and 16-bit quantization

Intended uses & limitations

This model is optimized for:

Transcribing Daniel Rosehill's voice
Technical and software development content
Mixed English with occasional Hebrew terms

Limitations:

Performance may degrade on voices significantly different from the training data
Limited to the vocabulary and accent patterns in the training set
Best suited for personal use rather than general-purpose transcription

Training and evaluation data

Training dataset consisted of approximately 1 hour of recorded audio featuring:

Technical discussions and software development content
Mixed English with occasional Hebrew vocabulary
Single speaker (Daniel Rosehill)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
training_steps: 400
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.1955	1.3158	50	0.2107
0.0622	2.6316	100	0.1896
0.0332	3.9474	150	0.1602
0.0202	5.2632	200	0.1994
0.0063	6.5789	250	0.2209
0.0022	7.8947	300	0.2114
0.001	9.2105	350	0.2216
0.0015	10.5263	400	0.2212

Framework versions

Transformers 4.57.1
Pytorch 2.9.1+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 7

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for danielrosehill/daniel_whisper_finetune_large_v3_turbo_v2

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(420)

this model

Collection including danielrosehill/daniel_whisper_finetune_large_v3_turbo_v2

My Whisper Fine-Tunes (V2)

Collection

Whisper fine-tunes for my voice and vocab (tech, Hebrew). About 1 hour of training data so still very much POCs! • 5 items • Updated Nov 23, 2025

danielrosehill
/

daniel_whisper_finetune_large_v3_turbo_v2