Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources
Paper • 2507.09245 • Published
This model translates Singlish (romanized Sinhala mixed with English) to Sinhala script. Built on google/mt5-small.
from transformers import pipeline
translator = pipeline("translation", model="savinugunarathna/singlish-to-sinhala-mt5-small")
result = translator("translate Singlish to Sinhala: oyage nama mokakda")
print(result[0]["translation_text"])
# Output: ඔයාගේ නම මොකක්ද
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("savinugunarathna/singlish-to-sinhala-mt5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("savinugunarathna/singlish-to-sinhala-mt5-small")
input_text = "translate Singlish to Sinhala: mama pasal yanawa"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=80, num_beams=5)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
# Output: මම පාසල යනවා
texts = [
"translate Singlish to Sinhala: kohomada",
"translate Singlish to Sinhala: mama hodata innawa",
"translate Singlish to Sinhala: api yamu"
]
inputs = tokenizer(texts, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=80, num_beams=5)
for i, output in enumerate(outputs):
print(f"{texts[i].split(': ')[1]} → {tokenizer.decode(output, skip_special_tokens=True)}")
| Singlish Input | Sinhala Output |
|---|---|
| oyage nama mokakda | ඔයාගේ නම මොකක්ද |
| api koheda yanne | අපි කොහෙද යන්නේ |
| kohomada | කොහොමද |
| mama hodata innawa | මම හොඳට ඉන්නවා |
✅ Handles phonetic romanization (standard Latin script)
✅ Understands informal Singlish (conversational variations)
✅ Subword tokenization (efficient processing with mT5)
✅ Prefix-based translation (requires "translate Singlish to Sinhala:" prefix)
If you use this model, please cite:
@misc{singlish-sinhala-mt5-20260116,
author = {savinugunarathna},
title = {Singlish to Sinhala Translation Model (mT5-Small)},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/savinugunarathna/singlish-to-sinhala-mt5-small}}
}
This model uses data from the Swa-bhasha Resource Hub:
@article{sumanathilaka2025swa,
title={Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources},
author={Sumanathilaka, Deshan and Perera, Sameera and Dharmasiri, Sachithya and Athukorala, Maneesha and Herath, Anuja Dilrukshi and Dias, Rukshan and Gamage, Pasindu and Weerasinghe, Ruvan and Priyadarshana, YHPP},
journal={arXiv preprint arXiv:2507.09245},
year={2025}
}
Apache 2.0
For questions or issues, please open an issue in the model repository.
Base model
google/mt5-small