Scaling Speech Technology to 1,000+ Languages
Paper • 2305.13516 • Published • 12
This model performs Automatic Speech Recognition (ASR) for the Turkmen language
using the Latin script (tuk-script_latin).
It is based on facebook/mms-1b-all, Meta's Massively Multilingual Speech model covering 1000+ languages.
import librosa
import torch
from transformers import Wav2Vec2ForCTC, AutoProcessor
processor = AutoProcessor.from_pretrained("derkar00/mms-tuk-latin-asr")
model = Wav2Vec2ForCTC.from_pretrained("derkar00/mms-tuk-latin-asr")
processor.tokenizer.set_target_lang("tuk-script_latin")
model.load_adapter("tuk-script_latin")
# Load audio (must be 16kHz mono)
audio, rate = librosa.load("your_audio.wav", sr=16000, mono=True)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
print(processor.decode(predicted_ids[0]))
tuk)Base model
facebook/mms-1b-all