Singlish to Sinhala Translation Model (mT5-Small)

This model translates Singlish (romanized Sinhala mixed with English) to Sinhala script. Built on google/mt5-small.

Model Description

  • Base Model: google/mt5-small
  • Task: Translation (Singlish → Sinhala)
  • Languages: Singlish (romanized Sinhala) → Sinhala (සිංහල)
  • Training Date: 2026-01-16
  • Architecture: Multilingual T5 (subword tokenization)

Training Details

  • Dataset Size: ~490,000 translation pairs
  • Data Source: Phonetic transcriptions + ad-hoc Singlish variants from Swa-bhasha Resource Hub
  • Hardware: Tesla P100 GPU
  • Framework: Hugging Face Transformers

Usage

Using Transformers Pipeline

from transformers import pipeline

translator = pipeline("translation", model="savinugunarathna/singlish-to-sinhala-mt5-small")
result = translator("translate Singlish to Sinhala: oyage nama mokakda")
print(result[0]["translation_text"])
# Output: ඔයාගේ නම මොකක්ද

Manual Loading

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("savinugunarathna/singlish-to-sinhala-mt5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("savinugunarathna/singlish-to-sinhala-mt5-small")

input_text = "translate Singlish to Sinhala: mama pasal yanawa"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=80, num_beams=5)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
# Output: මම පාසල යනවා

Batch Translation

texts = [
    "translate Singlish to Sinhala: kohomada",
    "translate Singlish to Sinhala: mama hodata innawa",
    "translate Singlish to Sinhala: api yamu"
]

inputs = tokenizer(texts, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=80, num_beams=5)

for i, output in enumerate(outputs):
    print(f"{texts[i].split(': ')[1]}{tokenizer.decode(output, skip_special_tokens=True)}")

Example Translations

Singlish Input Sinhala Output
oyage nama mokakda ඔයාගේ නම මොකක්ද
api koheda yanne අපි කොහෙද යන්නේ
kohomada කොහොමද
mama hodata innawa මම හොඳට ඉන්නවා

Model Capabilities

Handles phonetic romanization (standard Latin script)
Understands informal Singlish (conversational variations)
Subword tokenization (efficient processing with mT5)
Prefix-based translation (requires "translate Singlish to Sinhala:" prefix)

Limitations

  • Performance may vary with non-standard Singlish spellings
  • Best suited for conversational Singlish text
  • Requires the prefix "translate Singlish to Sinhala:" for optimal results
  • May struggle with very informal or heavily code-mixed text

Citations

If you use this model, please cite:

@misc{singlish-sinhala-mt5-20260116,
  author = {savinugunarathna},
  title = {Singlish to Sinhala Translation Model (mT5-Small)},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/savinugunarathna/singlish-to-sinhala-mt5-small}}
}

Data Source Citation

This model uses data from the Swa-bhasha Resource Hub:

@article{sumanathilaka2025swa,
  title={Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources},
  author={Sumanathilaka, Deshan and Perera, Sameera and Dharmasiri, Sachithya and Athukorala, Maneesha and Herath, Anuja Dilrukshi and Dias, Rukshan and Gamage, Pasindu and Weerasinghe, Ruvan and Priyadarshana, YHPP},
  journal={arXiv preprint arXiv:2507.09245},
  year={2025}
}

License

Apache 2.0

Acknowledgments

  • Base model: google/mt5-small
  • Training data: Swa-bhasha Resource Hub (Sumanathilaka et al., 2025)
  • Training framework: Hugging Face Transformers
  • Compute: Tesla P100 GPU

Model Card Contact

For questions or issues, please open an issue in the model repository.

Downloads last month
3
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for savinugunarathna/singlish-to-sinhala-mt5-small

Base model

google/mt5-small
Finetuned
(666)
this model

Paper for savinugunarathna/singlish-to-sinhala-mt5-small