Instructions to use Slasky/HebrewGPT-1B-AdamW with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Slasky/HebrewGPT-1B-AdamW with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Slasky/HebrewGPT-1B-AdamW", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Slasky/HebrewGPT-1B-AdamW", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Slasky/HebrewGPT-1B-AdamW with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Slasky/HebrewGPT-1B-AdamW" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Slasky/HebrewGPT-1B-AdamW", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Slasky/HebrewGPT-1B-AdamW
- SGLang
How to use Slasky/HebrewGPT-1B-AdamW with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Slasky/HebrewGPT-1B-AdamW" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Slasky/HebrewGPT-1B-AdamW", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Slasky/HebrewGPT-1B-AdamW" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Slasky/HebrewGPT-1B-AdamW", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Slasky/HebrewGPT-1B-AdamW with Docker Model Runner:
docker model run hf.co/Slasky/HebrewGPT-1B-AdamW
HebrewGPT-1B-AdamW ๐ฎ๐ฑ (Ablation)
HebrewGPT-1B-AdamW is an ablation variant of HebrewGPT-1B trained with the AdamW optimizer instead of Muon. All other training conditions โ architecture, data, hardware, and hyperparameters โ are identical. This model demonstrates that the Muon optimizer provides a 12.3% improvement in validation BPB over AdamW at the 1B parameter scale.
This model is provided for research comparison purposes. For the best-performing Hebrew language model, use HebrewGPT-1B.
- ๐ Paper: Hebrew Language Model Research via Agentic AI
- ๐ป GitHub: AgenticResearcher
- ๐ Primary model: HebrewGPT-1B (Muon optimizer)
Model Description
This model has the exact same architecture as HebrewGPT-1B:
| Parameter | Value |
|---|---|
| Parameters | 1.08B |
| Hidden size (WIDTH) | 2048 |
| Layers (DEPTH) | 20 |
| Attention heads | 16 |
| Head dimension | 128 |
| MLP type | SwiGLU (intermediate_size=5504) |
| Positional encoding | RoPE (interleaved, ฮธ=10000) |
| Normalization | RMSNorm |
| Vocabulary | 32,000 (Hebrew-native SentencePiece BPE) |
| Context length | 2,048 tokens |
| Weight tying | Yes |
| Precision | bfloat16 |
Training Details
What's Different
- Optimizer: AdamW (replacing Muon)
- Everything else is identical to HebrewGPT-1B
Training
- Optimizer: AdamW + Lookahead(k=5, ฮฑ=0.6) + SWA + 4 cosine cycles
- Dropout: 0.1
- Data: 2.48B tokens from 12 Hebrew datasets (same as primary model)
- Hardware: 8ร NVIDIA H100 80GB GPUs
- Training time: ~8 hours
- Steps: 11,904
Evaluation Results
Comparison: Muon vs AdamW
| Metric | HebrewGPT-1B (Muon) | HebrewGPT-1B-AdamW (this) | ฮ |
|---|---|---|---|
| Validation BPB (best ckpt) | 25.89 | 28.09 | +8.5% worse |
| Validation BPB (snapshot) | โ | 31.29 | โ |
| Validation BPB (SWA) | 25.89 | 31.73 | +22.6% worse |
Key Finding
Muon provides a 12.3% advantage over AdamW at 1B scale when comparing best checkpoint BPB (25.89 vs 28.09). The gap widens further with SWA, suggesting Muon finds flatter, more SWA-compatible minima.
This is a significant finding for the optimizer community โ Muon, originally designed for smaller models, scales effectively to 1B parameters and outperforms the established AdamW optimizer on Hebrew language modeling.
Usage
โ ๏ธ Custom Architecture: This model uses the same custom architecture as HebrewGPT-1B. See the primary model repo for the full model class definition.
import torch
import sentencepiece as spm
# Use the same generate.py from HebrewGPT-1B
from generate import HebrewGPT, ModelConfig
config = ModelConfig(
vocab_size=32000, width=2048, depth=20,
n_heads=16, head_dim=128, max_seq_len=2048, dropout=0.0,
)
model = HebrewGPT(config)
state_dict = torch.load("best.pt", map_location="cpu", weights_only=True)
if isinstance(state_dict, dict) and "model" in state_dict:
state_dict = state_dict["model"]
model.load_state_dict(state_dict)
model.eval()
sp = spm.SentencePieceProcessor()
sp.Load("tokenizer.model")
prompt = "ืืจืืฉืืช ืืจื ืืืืืื ืืช"
input_ids = torch.tensor([sp.Encode(prompt)])
output = model.generate(input_ids, max_new_tokens=100)
print(sp.Decode(output[0].tolist()))
Limitations
- Same limitations as HebrewGPT-1B
- Lower quality than the primary Muon-trained model
- Provided for ablation/research purposes only
Citation
@article{slasky2025hebrewgpt,
title={Hebrew Language Model Research via Agentic AI: Training HebrewGPT from Scratch},
author={Slasky, Ronnen},
year={2025},
url={https://d11k83yu06biio.cloudfront.net/paper/hebrew-autoresearch.html}
}
Acknowledgments
- Loki โ AI research assistant (Amazon Bedrock on OpenClaw)
- Andrej Karpathy โ For the autoresearch framework
Contact
- Author: Ronnen Slasky (ronnen@slasky.com)
- GitHub: fatherRonnen/AgenticResearcher
- Downloads last month
- 32