Model Overview
Turaco-mt-fr-gh is a specialized neural machine translation model fine-tuned for high-quality translation from French to Ghomálá.
This model is part of the Turaco family, an initiative focused on advancing translation capabilities for low-resource and underrepresented African languages. While large-scale multilingual models provide strong general foundations, they often lack depth and fluency when applied to specific low-resource languages. This project addresses that gap through targeted fine-tuning on curated parallel data.
Built on top of NLLB-200, Turaco-mt-fr-gh leverages multilingual transfer learning to produce more accurate, fluent, and context-aware translations into Ghomálá.
Model Details
- Developed by: fotiecodes
- Model type: Sequence-to-Sequence Transformer (Multilingual NMT)
- License: Apache-2.0
- Base model: facebook/nllb-200
- Task: Machine Translation (French → Ghomálá)
- Language(s): French (
fr), Ghomálá (gh)
Intended Use
This model is designed for:
- Translating French text into Ghomálá
- Supporting localization for Cameroonian and regional applications
- Experimentation with low-resource language translation
- Research on multilingual transfer learning and adaptation
Training Data
The model was fine-tuned on a parallel dataset of French–Ghomálá sentence pairs.
Key characteristics:
- High-quality aligned sentence pairs
- Focus on conversational and general-purpose language
- Cleaned and normalized text to reduce noise
- Balanced examples to improve consistency in output
Given the low-resource nature of Ghomálá, dataset quality and consistency were prioritized over sheer size.
Training Procedure
The model was fine-tuned using supervised learning on parallel translation data.
Key aspects:
Initialized from NLLB-200
Standard sequence-to-sequence training with source-target pairs
Tokenization handled using the pretrained NLLB tokenizer
Optimization focused on adapting the model to:
- Ghomálá vocabulary and structure
- French → Ghomálá alignment
- Improved fluency and coherence
The training process leverages NLLB’s multilingual representations, allowing the model to generalize better despite limited data.
Evaluation
Evaluation was primarily qualitative, focusing on:
- Fluency in Ghomálá
- Semantic correctness of translations
- Consistency in maintaining the target language
Preliminary results indicate:
📊 Results and evals:
- French → Ghomala: BLEU=4.9 | chrF2=19.1
- Ghomala → French: BLEU=10.8 | chrF2=30.5
Note:
The model shows stronger performance when translating from Ghomala to French than the reverse direction. However, overall scores (BLEU and chrF2) indicate that translation quality is still limited, especially for French → Ghomala. These results suggest the model is better at understanding Ghomala than generating it, and further training data or fine-tuning would be needed for production-level performance.
Limitations
Performance depends heavily on dataset size and diversity
May struggle with:
- Technical or domain-specific vocabulary
- Rare linguistic constructions
Not optimized for reverse translation (Ghomálá → French)
As with most neural MT systems, outputs may occasionally:
- Be inconsistent
- Contain minor hallucinations or approximations
Future Work
- Expand the French–Ghomálá dataset with more diverse domains
- Explore parameter-efficient fine-tuning (LoRA, adapters)
- Benchmark against other multilingual MT systems
- Incorporate human evaluation from native speakers
Ethical Considerations
This model contributes to improving representation of under-resourced African languages in AI.
Care should be taken to:
- Respect linguistic and cultural nuances of Ghomálá
- Validate outputs in sensitive or critical contexts
- Involve native speakers in evaluation and feedback loops
- Avoid over-reliance in high-stakes applications without verification
Citation
If you use this model, please cite:
@model{turaco_mt_fr_gh,
author = {fotiecodes},
title = {Turaco-mt-fr-gh},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/fotiecodes/Turaco-mt-fr-gh}
}
- Downloads last month
- 77
Model tree for fotiecodes/Turaco-mt-fr-gh
Base model
facebook/nllb-200-distilled-600M