Trustt-IndicVoices-Whisper-Large-v3

Model Description

Trustt-IndicVoices-Whisper-Large-v3 is an advanced language identification model developed by Trustt.com, designed to accurately classify regional languages spoken across India. Built on OpenAI's Whisper-Large-v3 architecture and fine-tuned using LoRA on the AI4Bharat IndicVoices dataset, this model delivers enterprise-grade language identification capabilities for multilingual speech processing applications.

This model is part of Trustt.com's commitment to open-source innovation in speech technology, providing SaaS platforms and enterprises serving the Indian market with robust, production-ready language identification.

Supported Languages

The model supports classification across 23 Indian languages:

label_list = [
    "assamese",
    "bengali",
    "bodo",
    "dogri",
    "english",
    "gujarati",
    "hindi",
    "kannada",
    "kashmiri",
    "konkani",
    "maithili",
    "malayalam",
    "manipuri",
    "marathi",
    "nepali",
    "odia",
    "punjabi",
    "sanskrit",
    "santali",
    "sindhi",
    "tamil",
    "telugu",
    "urdu"
]

Installation

Prerequisites

Python 3.8 or higher
PyTorch 1.9.0 or higher
CUDA-capable GPU (recommended for optimal performance)

Download Model

# Download the model repository
huggingface-cli download vignesh-trustt/whisper-v3-large-IndicVoices

Setup

# Create a virtual environment
conda create -n trustt_indic_whisper python=3.8
conda activate trustt_indic_whisper

# Navigate to the project directory
cd Trustt-IndicVoices

# Install the package and dependencies
pip install -e .

Usage

Model Loading

import torch
import torch.nn.functional as F
from src.model.dialect.whisper_dialect import WhisperWrapper

# Initialize device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load Trustt-IndicVoices-Whisper-Large-v3 from Hugging Face
model = WhisperWrapper.from_pretrained("vignesh-trustt/whisper-v3-large-IndicVoices").to(device)
model.train(False)  # Set to inference mode

Language Classification

label_list = [
    "assamese", "bengali", "bodo", "dogri", "english",
    "gujarati", "hindi", "kannada", "kashmiri", "konkani",
    "maithili", "malayalam", "manipuri", "marathi", "nepali",
    "odia", "punjabi", "sanskrit", "santali", "sindhi",
    "tamil", "telugu", "urdu"
]

# Prepare audio input (16kHz, mono, 3-15 seconds recommended)
max_audio_length = 15 * 16000  # 15 seconds at 16kHz
audio_data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]

# Perform inference
with torch.no_grad():
    logits, embeddings = model(audio_data, return_feature=True)

# Compute language probabilities
language_probs = F.softmax(logits, dim=1)
predicted_language_idx = torch.argmax(language_probs).detach().cpu().item()
predicted_language = label_list[predicted_language_idx]

print(f"Predicted language: {predicted_language}")
print(f"Confidence: {language_probs[0][predicted_language_idx]:.4f}")

Audio Preprocessing Requirements

For optimal model performance, ensure your audio input meets the following specifications:

Sample Rate: 16kHz
Channels: Mono (single channel)
Duration: 3-15 seconds (recommended)
- Audio shorter than 3 seconds may yield unreliable predictions
- Audio longer than 15 seconds will be truncated to the first 15 seconds

Model Architecture

This model uses LoRA (Low-Rank Adaptation) fine-tuning on top of Whisper Large v3:

Parameter	Value
Base Model	openai/whisper-large-v3
Fine-tune Method	LoRA
LoRA Rank	64
Output Classes	23
Hidden Dim	256

Model Performance

The model has been trained and validated on diverse datasets including:

AI4Bharat IndicVoices
Mozilla Common Voice 11.0

Performance metrics are optimized for accuracy across the supported Indian languages.

Enterprise Integration

This model is designed for seamless integration into SaaS platforms and enterprise applications. For production deployments, consider:

Batch processing capabilities for high-throughput scenarios
GPU acceleration for real-time inference
Model quantization for resource-constrained environments
API wrapper implementation for microservices architecture

Responsible Use

Users are expected to:

Respect privacy and consent of data subjects
Comply with applicable data protection laws and regulations
Use the model in accordance with ethical AI practices
Ensure appropriate data handling and security measures

License

This model is released under the OpenRAIL license, enabling both research and commercial use while maintaining responsible AI principles.

About Trustt.com

Trustt.com is a leading SaaS platform committed to advancing speech technology and multilingual AI solutions. This model represents our ongoing contribution to the open-source community and our dedication to making advanced language technologies accessible to developers and enterprises.

Support

For technical support, feature requests, or commercial licensing inquiries, please visit Trustt.com or contact our support team.

Contributing

We welcome contributions from the community. Please refer to our contribution guidelines for more information on how to participate in improving this model.

Downloads last month: 6

Safetensors

Model size

2B params

Tensor type

F32

Model tree for vignesh-trustt/whisper-v3-large-IndicVoices

Base model

openai/whisper-large-v3

Finetuned

(864)

this model

vignesh-trustt
/

whisper-v3-large-IndicVoices