Swahili GPT
A character-level GPT language model trained on Swahili news articles.
Model Details
- Architecture: 6-layer decoder-only Transformer
- Parameters: 11M
- Training data: 29,544 Swahili news articles (65M characters)
- Tokenization: Character-level (vocab size: 464)
- Context window: 256 characters
- Final loss: 1.15
Usage
# Clone the repo and run locally
git clone https://github.com/RamadhanAdam/swahili-gpt
cd swahili-gpt
pip install -r requirements.txt
# Download checkpoint from GitHub Releases
# Then generate interactively
python generate.py --prompt "Rais wa Tanzania" --tokens 400
Training
Trained for 5,000 steps using AdamW (lr=3e-4), dropout=0.2 on a Google Colab T4 GPU.
| Step | Train Loss | Val Loss |
|---|---|---|
| 0 | 6.2798 | 6.2793 |
| 2500 | 1.2566 | 1.2548 |
| 4500 | 1.1350 | 1.1387 |
Training Curves
Dataset
Trained on the swahili_news dataset — 29,544 news articles from Tanzanian online platforms. License: CC BY 4.0.
Limitations
- Character-level generation — not suitable for production NLP tasks
- Small training corpus relative to modern language models
- Generates plausible Swahili morphology but not fully coherent text
Code & Details
Full code, training notebook, and results: github.com/RamadhanAdam/swahili-gpt
Acknowledgements
Inspired by Andrej Karpathy's work on character-level language models.

