Swahili-English Translation Model (General Domain Expansion)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on a large corpus of general Swahili-English translations while maintaining helpline translation quality.

Model Details

Base Model: Helsinki-NLP/opus-mt-mul-en
Language Pair: Swahili (sw) → English (en)
Training Data:
- CCAligned general corpus (~200k+ samples)
- Helpline conversation data (oversampled 5x for domain retention)
Special Features:
- Domain-aware with <HELPLINE> and <GENERAL> tags
- Optimized for both general and helpline translations
- Knowledge distillation from helpline-specialized model

Training Procedure

Memory Optimizations

CPU teacher offloading
Gradient checkpointing
Batch size: 8, Gradient accumulation: 16

Training Hyperparameters

Learning rate: 1.5e-5
Epochs: 1
Optimizer: AdamW
LR Scheduler: Cosine with warmup

Performance

Domain	BLEU	chrF
Helpline	X.XX	XX.X
General	X.XX	XX.X

(Replace with actual metrics from training)

Usage

from transformers import MarianMTModel, MarianTokenizer

# Load model and tokenizer
model_name = "marlonbino/sw-en-opus-finetuned"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# For general translations
text = "<GENERAL> Habari za asubuhi"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "Good morning"

# For helpline translations
text = "<HELPLINE> Ninahitaji msaada wa haraka"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "I need urgent help"

Limitations

Optimized for Swahili to English (not bidirectional)
Best performance with domain tags ( or )
May struggle with very technical or specialized vocabulary outside training domains

Training Details

Framework: Transformers + PyTorch
Hardware: Single GPU training
Training Time: ~X hours
Checkpoint Strategy: Every 500 steps for power failure recovery

Citation

If you use this model, please cite:

@misc{{sw-en-general-expanded,
  author = {{Your Name/Organization}},
  title = {{Swahili-English General Domain Translation Model}},
  year = {{2025}},
  publisher = {{HuggingFace}},
  url = {{https://huggingface.co/marlonbino/sw-en-opus-finetuned}}
}}

License

This model inherits the license from Helsinki-NLP/opus-mt-mul-en.

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 1

Safetensors

Model size

77.1M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support