Swahili GPT

A character-level GPT language model trained on Swahili news articles.

Model Details

  • Architecture: 6-layer decoder-only Transformer
  • Parameters: 11M
  • Training data: 29,544 Swahili news articles (65M characters)
  • Tokenization: Character-level (vocab size: 464)
  • Context window: 256 characters
  • Final loss: 1.15

Usage

# Clone the repo and run locally
git clone https://github.com/RamadhanAdam/swahili-gpt
cd swahili-gpt
pip install -r requirements.txt

# Download checkpoint from GitHub Releases
# Then generate interactively
python generate.py --prompt "Rais wa Tanzania" --tokens 400

Training

Trained for 5,000 steps using AdamW (lr=3e-4), dropout=0.2 on a Google Colab T4 GPU.

Step Train Loss Val Loss
0 6.2798 6.2793
2500 1.2566 1.2548
4500 1.1350 1.1387

Training Curves

Bigram Loss GPT Loss

Dataset

Trained on the swahili_news dataset — 29,544 news articles from Tanzanian online platforms. License: CC BY 4.0.

Limitations

  • Character-level generation — not suitable for production NLP tasks
  • Small training corpus relative to modern language models
  • Generates plausible Swahili morphology but not fully coherent text

Code & Details

Full code, training notebook, and results: github.com/RamadhanAdam/swahili-gpt

Acknowledgements

Inspired by Andrej Karpathy's work on character-level language models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train RamadhanZome/swahili-gpt