A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things:

  • Basics of attention and RoPE
  • Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful
  • Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data

This repository contains the pretrained model and tokenizer.

See gpahal/microgpt for instructions on how to use the model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support