Whisper-Base Korean LoRA

한국어 음성 인식(ASR)을 위해 LoRA fine-tuning된 Whisper-base 모델입니다.

Model Details

Model Description

독거노인 및 취약계층 복지 상담 시스템을 위해 학습된 한국어 음성 인식 모델입니다.

  • Developed by: Jaehyeon
  • Model type: LoRA Adapter for Whisper
  • Language(s): Korean (한국어)
  • License: Apache 2.0
  • Finetuned from model: openai/whisper-base

Evaluation Results

Model Category WER CER
Baseline ALL 0.4236 0.1588
LoRA Fine-tuned ALL 0.2592 0.0584
Baseline 정신 건강 복지 0.354 0.1315
LoRA Fine-tuned 정신 건강 복지 0.228 0.0571

Performance Improvement

  • WER: 42.36% → 25.92% (38.8% relative improvement)
  • CER: 15.88% → 5.84% (63.2% relative improvement)

Uses

Direct Use

한국어 음성을 텍스트로 변환하는 ASR 작업에 사용됩니다.

Downstream Use

  • 복지 콜센터 음성 상담 시스템
  • 독거노인/취약계층 안부 전화 시스템
  • 한국어 음성 인식 애플리케이션

How to Get Started with the Model

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch
import librosa

# Load base model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "jaehyeono/whisper-base-korean-lora")
model = model.merge_and_unload()  # Merge for faster inference
model.eval()

# Inference
audio, sr = librosa.load("audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    predicted_ids = model.generate(input_features, language="ko", task="transcribe")

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training Details

Training Data

Dataset Description
AIHub 186 한국어 음성 데이터 (일반 대화)
Zeroth Korean 공개 한국어 음성 데이터셋
AIHub 134 감정/정신건강 관련 음성 데이터

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • LoRA r: 64
  • LoRA alpha: 128
  • Target modules: q_proj, v_proj, o_proj, k_proj
  • Total steps: 25,000
  • Batch size: 32
  • Learning rate: 1e-4
  • LR scheduler: Cosine
  • Warmup ratio: 0.03

Bias, Risks, and Limitations

  • 노이즈가 심한 환경에서는 성능 저하 가능
  • 방언이나 특수 억양은 학습 데이터에 제한적으로 포함됨
  • whisper-base 기반이므로 whisper-large 대비 성능 한계 존재

Technical Specifications

Model Architecture and Objective

Whisper-base 모델에 LoRA adapter를 적용하여 한국어 ASR 성능을 향상시켰습니다.

Compute Infrastructure

Hardware

  • NVIDIA GPU with CUDA support

Software

  • Transformers
  • PEFT 0.18.1
  • PyTorch

Citation

@misc{whisper-korean-lora-2026,
  title={Whisper-Base Korean LoRA for Welfare Call Center},
  author={Jaehyeon},
  year={2026},
  publisher={HuggingFace}
}

Framework versions

  • PEFT 0.18.1
  • Transformers 4.35+
  • PyTorch 2.0+
Downloads last month
167
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jaehyeono/whisper-base-korean-lora

Adapter
(42)
this model

Dataset used to train jaehyeono/whisper-base-korean-lora

Collection including jaehyeono/whisper-base-korean-lora