Instructions to use vignesh-trustt/whisper-v3-large-IndicVoices with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vignesh-trustt/whisper-v3-large-IndicVoices with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="vignesh-trustt/whisper-v3-large-IndicVoices")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vignesh-trustt/whisper-v3-large-IndicVoices", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Trustt-IndicVoices-Whisper-Large-v3
Model Description
Trustt-IndicVoices-Whisper-Large-v3 is an advanced language identification model developed by Trustt.com, designed to accurately classify regional languages spoken across India. Built on OpenAI's Whisper-Large-v3 architecture and fine-tuned using LoRA on the AI4Bharat IndicVoices dataset, this model delivers enterprise-grade language identification capabilities for multilingual speech processing applications.
This model is part of Trustt.com's commitment to open-source innovation in speech technology, providing SaaS platforms and enterprises serving the Indian market with robust, production-ready language identification.
Supported Languages
The model supports classification across 23 Indian languages:
label_list = [
"assamese",
"bengali",
"bodo",
"dogri",
"english",
"gujarati",
"hindi",
"kannada",
"kashmiri",
"konkani",
"maithili",
"malayalam",
"manipuri",
"marathi",
"nepali",
"odia",
"punjabi",
"sanskrit",
"santali",
"sindhi",
"tamil",
"telugu",
"urdu"
]
Installation
Prerequisites
- Python 3.8 or higher
- PyTorch 1.9.0 or higher
- CUDA-capable GPU (recommended for optimal performance)
Download Model
# Download the model repository
huggingface-cli download vignesh-trustt/whisper-v3-large-IndicVoices
Setup
# Create a virtual environment
conda create -n trustt_indic_whisper python=3.8
conda activate trustt_indic_whisper
# Navigate to the project directory
cd Trustt-IndicVoices
# Install the package and dependencies
pip install -e .
Usage
Model Loading
import torch
import torch.nn.functional as F
from src.model.dialect.whisper_dialect import WhisperWrapper
# Initialize device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load Trustt-IndicVoices-Whisper-Large-v3 from Hugging Face
model = WhisperWrapper.from_pretrained("vignesh-trustt/whisper-v3-large-IndicVoices").to(device)
model.train(False) # Set to inference mode
Language Classification
label_list = [
"assamese", "bengali", "bodo", "dogri", "english",
"gujarati", "hindi", "kannada", "kashmiri", "konkani",
"maithili", "malayalam", "manipuri", "marathi", "nepali",
"odia", "punjabi", "sanskrit", "santali", "sindhi",
"tamil", "telugu", "urdu"
]
# Prepare audio input (16kHz, mono, 3-15 seconds recommended)
max_audio_length = 15 * 16000 # 15 seconds at 16kHz
audio_data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
# Perform inference
with torch.no_grad():
logits, embeddings = model(audio_data, return_feature=True)
# Compute language probabilities
language_probs = F.softmax(logits, dim=1)
predicted_language_idx = torch.argmax(language_probs).detach().cpu().item()
predicted_language = label_list[predicted_language_idx]
print(f"Predicted language: {predicted_language}")
print(f"Confidence: {language_probs[0][predicted_language_idx]:.4f}")
Audio Preprocessing Requirements
For optimal model performance, ensure your audio input meets the following specifications:
- Sample Rate: 16kHz
- Channels: Mono (single channel)
- Duration: 3-15 seconds (recommended)
- Audio shorter than 3 seconds may yield unreliable predictions
- Audio longer than 15 seconds will be truncated to the first 15 seconds
Model Architecture
This model uses LoRA (Low-Rank Adaptation) fine-tuning on top of Whisper Large v3:
| Parameter | Value |
|---|---|
| Base Model | openai/whisper-large-v3 |
| Fine-tune Method | LoRA |
| LoRA Rank | 64 |
| Output Classes | 23 |
| Hidden Dim | 256 |
Model Performance
The model has been trained and validated on diverse datasets including:
- AI4Bharat IndicVoices
- Mozilla Common Voice 11.0
Performance metrics are optimized for accuracy across the supported Indian languages.
Enterprise Integration
This model is designed for seamless integration into SaaS platforms and enterprise applications. For production deployments, consider:
- Batch processing capabilities for high-throughput scenarios
- GPU acceleration for real-time inference
- Model quantization for resource-constrained environments
- API wrapper implementation for microservices architecture
Responsible Use
Users are expected to:
- Respect privacy and consent of data subjects
- Comply with applicable data protection laws and regulations
- Use the model in accordance with ethical AI practices
- Ensure appropriate data handling and security measures
License
This model is released under the OpenRAIL license, enabling both research and commercial use while maintaining responsible AI principles.
About Trustt.com
Trustt.com is a leading SaaS platform committed to advancing speech technology and multilingual AI solutions. This model represents our ongoing contribution to the open-source community and our dedication to making advanced language technologies accessible to developers and enterprises.
Support
For technical support, feature requests, or commercial licensing inquiries, please visit Trustt.com or contact our support team.
Contributing
We welcome contributions from the community. Please refer to our contribution guidelines for more information on how to participate in improving this model.
- Downloads last month
- 6
Model tree for vignesh-trustt/whisper-v3-large-IndicVoices
Base model
openai/whisper-large-v3