BanglaLLM-7B
First native Bengali Large Language Model trained from scratch by undergraduate researchers at AIUB (American International University-Bangladesh).
Model Details
- Parameters: 5,933,981,248 (5.93B)
- Architecture: LLaMA-3 style decoder-only transformer
- 32 layers, 4096 hidden dim
- GQA: 32 query heads / 8 KV heads (4:1 ratio)
- SwiGLU activation, RMSNorm, RoPE
- Flash Attention 2
- Weight tying (embedding/LM head)
- Context length: 4096 tokens
- Vocabulary: 64,000 (custom Bengali BPE tokenizer)
- Training data: ~1B Bengali tokens (Wikipedia + IndicCorpV2 + textbooks)
- Training compute: NVIDIA H100 80GB, bfloat16, gradient checkpointing
- Best training loss: 1.8665
- Held-out perplexity: 10.21 (Bengali Wikipedia held-out)
Limitations
This model is significantly undertrained per Chinchilla scaling laws (1B tokens vs ~120B optimal for 5.93B parameters). It has learned Bengali grammar, vocabulary, and syntactic patterns but lacks deep factual world knowledge. Future work: scale to 50B+ tokens.
Authors
- Avoy Mollick (23-50066-1)
- Apurba (23-50067-1)
- Arpon (23-50068-1)
Course: NLP CSC4233, AIUB, Spring 2025-2026
Supervisor: Dr. MD. Saef Ullah Miah
GitHub: github.com/avoymollick/BanglaLLM-7B
Usage
import torch
from bangla_llm import BanglaLLM, Config
from tokenizers import Tokenizer
cfg = Config("7b")
model = BanglaLLM(cfg).to("cuda").to(torch.bfloat16)
ck = torch.load("base_checkpoint.pt", map_location="cuda", weights_only=False)
model.load_state_dict(ck["model"])
model.eval()
tok = Tokenizer.from_file("tokenizer/tokenizer.json")
prompt = "বাংলাদেশ একটি"
ids = torch.tensor([[1] + tok.encode(prompt, add_special_tokens=False).ids]).to("cuda")
out = model.generate(ids, max_new=100, temp=0.3, top_p=0.9, eos=2)
print(tok.decode(out[0].tolist(), skip_special_tokens=True))
Citation
@misc{banglallm7b2026,
title={BanglaLLM-7B: First Native Bengali Language Model Trained From Scratch},
author={Mollick, Avoy and Apurba and Arpon},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/avoymollick/BanglaLLM-7B-base}}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support