A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things:

Basics of attention and RoPE
Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful
Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data

This repository contains the pretrained model and tokenizer.

See gpahal/microgpt for instructions on how to use the model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support