YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Swahili-English Translation Model (General Domain Expansion)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on a large corpus of general Swahili-English translations while maintaining helpline translation quality.

Model Details

  • Base Model: Helsinki-NLP/opus-mt-mul-en
  • Language Pair: Swahili (sw) → English (en)
  • Training Data:
    • CCAligned general corpus (~200k+ samples)
    • Helpline conversation data (oversampled 5x for domain retention)
  • Special Features:
    • Domain-aware with <HELPLINE> and <GENERAL> tags
    • Optimized for both general and helpline translations
    • Knowledge distillation from helpline-specialized model

Training Procedure

Memory Optimizations

  • CPU teacher offloading
  • Gradient checkpointing
  • Batch size: 8, Gradient accumulation: 16

Training Hyperparameters

  • Learning rate: 1.5e-5
  • Epochs: 1
  • Optimizer: AdamW
  • LR Scheduler: Cosine with warmup

Performance

Domain BLEU chrF
Helpline X.XX XX.X
General X.XX XX.X

(Replace with actual metrics from training)

Usage

from transformers import MarianMTModel, MarianTokenizer

# Load model and tokenizer
model_name = "marlonbino/sw-en-opus-finetuned"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# For general translations
text = "<GENERAL> Habari za asubuhi"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "Good morning"

# For helpline translations
text = "<HELPLINE> Ninahitaji msaada wa haraka"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # "I need urgent help"

Limitations

  • Optimized for Swahili to English (not bidirectional)
  • Best performance with domain tags ( or )
  • May struggle with very technical or specialized vocabulary outside training domains

Training Details

  • Framework: Transformers + PyTorch
  • Hardware: Single GPU training
  • Training Time: ~X hours
  • Checkpoint Strategy: Every 500 steps for power failure recovery

Citation

If you use this model, please cite:

@misc{{sw-en-general-expanded,
  author = {{Your Name/Organization}},
  title = {{Swahili-English General Domain Translation Model}},
  year = {{2025}},
  publisher = {{HuggingFace}},
  url = {{https://huggingface.co/marlonbino/sw-en-opus-finetuned}}
}}

License

This model inherits the license from Helsinki-NLP/opus-mt-mul-en.

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
1
Safetensors
Model size
77.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support