A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things:
- Basics of attention and RoPE
- Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful
- Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data
This repository contains the pretrained model and tokenizer.
See gpahal/microgpt for instructions on how to use the model.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support