| --- |
| language: |
| - en |
| - hi |
| - kn |
| license: mit |
| tags: |
| - causal-lm |
| - multilingual |
| - indic |
| - hindi |
| - kannada |
| - instruction-tuned |
| - dpo |
| - preference-alignment |
| pipeline_tag: text-generation |
| base_model: ace-1/mgpt2-sft |
| --- |
| |
| # mgpt2-dpo — Multilingual GPT-2 (Preference-Aligned) |
|
|
| **Recommended model from this project.** `mgpt2-sft` further aligned with |
| Direct Preference Optimization (DPO, β=0.1) on 13,500 toxic preference pairs |
| from ai4bharat/indic-align (HHRLHF-T). Chosen responses are Llama2-70B-Chat |
| safety refusals; rejected responses are raw pretrained-model continuations. |
|
|
| DPO increased the log-probability of safety-refusal responses by **+6.1%** and |
| decreased the log-probability of rejected responses by **−16.8%** relative to |
| the SFT checkpoint. Generation comparisons show the SFT model attempts to comply |
| with harmful prompts; the DPO model redirects. See the [project report](https://github.com) |
| for full analysis. |
|
|
| ## Quick start |
|
|
| ```python |
| import sys, torch |
| import torch.nn.functional as F |
| from huggingface_hub import snapshot_download |
| |
| local = snapshot_download("ace-1/mgpt2-dpo") |
| sys.path.insert(0, local) |
| from model import GPT |
| from tokenizer.regex_tokenizer import RegexTokenizer |
| |
| ckpt = torch.load(f"{local}/pytorch_model.pt", weights_only=False, map_location="cpu") |
| model = GPT(ckpt["config"]) |
| model.load_state_dict(ckpt["model"]) |
| model.eval() |
| |
| enc = RegexTokenizer() |
| enc.load(f"{local}/tokenizer/artifacts/mgpt2.model") |
| |
| prompts = [ |
| "Explain what photosynthesis is.", # English |
| "प्रकाश संश्लेषण क्या है?", # Hindi (Devanagari) |
| "ದ್ಯುತಿಸಂಶ್ಲೇಷಣೆ ಎಂದರೇನು?", # Kannada script |
| ] |
| |
| for prompt in prompts: |
| ids = enc.encode(prompt) |
| x = torch.tensor(ids, dtype=torch.long).unsqueeze(0) |
| with torch.no_grad(): |
| for _ in range(120): |
| logits, _ = model(x[:, -1024:]) |
| probs = F.softmax(logits[:, -1, :] / 0.7, dim=-1) |
| next_id = torch.multinomial(probs, num_samples=1) |
| if next_id.item() == 50256: break |
| x = torch.cat([x, next_id], dim=1) |
| print(f"Prompt : {prompt}") |
| print(f"Response: {enc.decode(x[0, len(ids):].tolist())}") |
| print() |
| ``` |
|
|
| ## Intended use |
|
|
| **Good for:** |
| - Multilingual instruction following with light safety alignment (en/hi/kn) |
| - Research: DPO alignment dynamics at 124M scale |
| - Demo of end-to-end LLM pipeline: pretrain → SFT → DPO |
|
|
| **Not for:** Production safety-critical applications. Alignment is format-preference alignment |
| (coherent refusals vs incoherent noise), not full safety alignment. At 124M |
| parameters the pretrained model could not generate coherent harmful content, so the |
| DPO preference signal is weaker than production RLHF setups. |
|
|
| ## Model details |
|
|
| | Property | Value | |
| |---|---| |
| | Architecture | GPT-2 (12 layers / 12 heads / 768d) | |
| | Parameters | ~124M | |
| | Vocabulary | 50,257 (mgpt2 BPE) + padded to 50,304 | |
| | Context length | 1,024 tokens | |
| | Training stage | DPO (preference-aligned) | |
| | Git commit | `e463752bb14b` | |
|
|
| ## Training configuration |
|
|
| | Parameter | Value | |
| |---|---| |
| | `seed` | `1337` | |
| | `batch_size` | `32` | |
| | `micro_batch_size` | `4` | |
| | `beta` | `0.1` | |
| | `max_lr` | `1e-06` | |
| | `min_lr_ratio` | `0.1` | |
| | `warmup_steps` | `20` | |
| | `epochs` | `1` | |
| | `weight_decay` | `0.1` | |
| | `eval_interval` | `50` | |
|
|
| ## Evaluation |
|
|
| | Metric | Value | Notes | |
| |---|---|---| |
| | Preference win-rate | 1.000 | Held-out DPO pairs (n=1,496) | |
| | DPO val loss | ~0 | Training converged fully | |
| | SFT loss regression | +1.2% | Within 5% threshold (regression_ok=True) | |
| | Chosen log-p Δ | +6.1% | vs SFT checkpoint on same pairs | |
| | Rejected log-p Δ | −16.8% | vs SFT checkpoint on same pairs | |
| | Preference margin Δ | +29.1% | chosen − rejected margin widened | |
| |
| > 100% win-rate reflects format-preference alignment (coherent refusals vs word-salad), |
| > not full safety alignment. See project report for full generation comparison. |
| |
| ## Training data |
| |
| | Language | Count | Chosen source | Rejected source | |
| |---|---|---|---| |
| | English (`eng_Latn`) | 8,250 | Llama2-70B-Chat safety refusals | Phase C pretrained mgpt2 | |
| | Hindi Devanagari (`hin_Deva`) | 2,700 | IndicTrans2-translated refusals | Phase C pretrained mgpt2 | |
| | Kannada script (`kan_Knda`) | 1,950 | IndicTrans2-translated refusals | Phase C pretrained mgpt2 | |
| | Hindi Latin (`hin_Latn`) | 1,050 | IndicTrans2 romanisation | Phase C pretrained mgpt2 | |
| | Kannada Latin (`kan_Latn`) | 1,050 | IndicTrans2 romanisation | Phase C pretrained mgpt2 | |
|
|
| 13,500 train / 1,499 val pairs. Source: [ai4bharat/indic-align](https://huggingface.co/datasets/ai4bharat/indic-align) HHRLHF-T config. |
|
|
| ## Tokenizer |
|
|
| Custom multilingual regex + BPE tokenizer (`mgpt2`), trained on the same corpus mixture. |
| Same vocabulary size as tiktoken-gpt2 (50,257 tokens), but with Indic-aware merge priorities: |
|
|
| | Bucket | tiktoken-gpt2 | **mgpt2** | Δ | |
| |---|---:|---:|---:| |
| | Overall | 480 tok/kB | **223 tok/kB** | −54% | |
| | Devanagari | 592 tok/kB | **215 tok/kB** | −64% | |
| | Kannada | 981 tok/kB | **213 tok/kB** | −78% | |
| | Latin | 257 tok/kB | **230 tok/kB** | −10% | |
|
|
| Tokenizer published separately: [ace-1/mgpt2-tokenizer](https://huggingface.co/ace-1/mgpt2-tokenizer) |
|
|
| ## Known limitations |
|
|
| - **Format-preference alignment, not full safety alignment.** At 124M parameters, the pretrained model generates incoherent text for toxic prompts, so the DPO preference signal trains format preference (coherent refusals vs noise) rather than genuine safety reasoning. |
| - **Transliterated Latin script drift** (inherited from SFT checkpoint) — `hin_Latn`/`kan_Latn` may switch scripts mid-generation. |
| - **124M parameters.** Factual accuracy and multi-step reasoning are limited. |
| - **Research checkpoint** — not evaluated for production use. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{mgpt2, |
| title = {mgpt2: Multilingual GPT-2 with custom Indic tokenizer}, |
| year = {2026}, |
| note = {Pretrain → SFT → DPO pipeline for English/Hindi/Kannada}, |
| url = {https://huggingface.co/ace-1/mgpt2-dpo} |
| } |
| ``` |
|
|