LilMoo - a Polygl0t Collection

Polygl0t 's Collections

ViTucano-v1 (Portuguese)

Tucano (Portuguese)

TeenyTinyLlama (Portuguese)

LilMoo

updated Mar 5

A 0.6-billion-parameter Hindi language models trained entirely from scratch.

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

Paper • 2603.03508 • Published Mar 3 • 3
Polygl0t/LilMoo-v0.1

Text Generation • 0.7B • Updated Mar 5 • 13

Note 🧱 Base model pretrained only with Hindi text.
Polygl0t/LilMoo-v0.2

Text Generation • 0.7B • Updated Mar 5 • 12

Note 🧱 Base model pretrained with a Hindi+ English mixture.
Polygl0t/gigalekh-v1

Viewer • Updated Mar 5 • 83.6M • 198

Note 📚 Pretraining dataset.
Polygl0t/hindi-edu-qwen-annotations

Viewer • Updated Mar 5 • 400k • 46

Note 📚 Annotations to train classifiers/filters (Educational).
Polygl0t/hindi-toxicity-qwen-annotations

Viewer • Updated Mar 5 • 354k • 17

Note 📚 Annotations to train classifiers/filters (Toxicity).
Polygl0t/hindi-roberta-edu-classifier

Text Classification • 0.3B • Updated Mar 5 • 2

Note 🎯 Quality Filter (Educational)
Polygl0t/hindi-roberta-toxicity-classifier

Text Classification • 0.3B • Updated Mar 5 • 8

Note 🎯 Quality Filter (Toxicity)
Polygl0t/tokenizers

Viewer • Updated Mar 5 • 8.98M • 717

Note 📚 Data used to train the LilMoo tokenizer.