Swahili GPT

A character-level GPT language model trained on Swahili news articles.

Model Details

Architecture: 6-layer decoder-only Transformer
Parameters: 11M
Training data: 29,544 Swahili news articles (65M characters)
Tokenization: Character-level (vocab size: 464)
Context window: 256 characters
Final loss: 1.15

Usage

# Clone the repo and run locally
git clone https://github.com/RamadhanAdam/swahili-gpt
cd swahili-gpt
pip install -r requirements.txt

# Download checkpoint from GitHub Releases
# Then generate interactively
python generate.py --prompt "Rais wa Tanzania" --tokens 400

Training

Trained for 5,000 steps using AdamW (lr=3e-4), dropout=0.2 on a Google Colab T4 GPU.

Step	Train Loss	Val Loss
0	6.2798	6.2793
2500	1.2566	1.2548
4500	1.1350	1.1387

Training Curves

Dataset

Trained on the swahili_news dataset — 29,544 news articles from Tanzanian online platforms. License: CC BY 4.0.

Limitations

Character-level generation — not suitable for production NLP tasks
Small training corpus relative to modern language models
Generates plausible Swahili morphology but not fully coherent text

Code & Details

Full code, training notebook, and results: github.com/RamadhanAdam/swahili-gpt

Acknowledgements

Inspired by Andrej Karpathy's work on character-level language models.

Downloads last month: -; Downloads are not tracked for this model. How to track

RamadhanZome
/

swahili-gpt