ai4bharat/samanantar
Viewer β’ Updated β’ 49.8M β’ 2.4k β’ 39
Fine-tuned MarianMT model for English β Hindi translation. This model is trained on AI4Bharat's Samanantar dataset, which contains over 10 million high-quality parallel sentences.
Helsinki-NLP/opus-mt-en-hiai4bharat/samanantar EnglishβHindi subset| Domain | Base BLEU | Fine-tuned BLEU | Base chrF | Fine-tuned chrF |
|---|---|---|---|---|
| Healthcare | 15.54 | 27.95 | 38.06 | 54.09 |
| Gen News | 14.11 | 26.31 | 39.07 | 52.98 |
| Culture/Tourism | 12.76 | 18.49 | 35.07 | 41.32 |
| Education | 20.28 | 28.82 | 43.84 | 49.68 |
β
BLEU improvements of +8 to +13 points across domains
β
chrF boosts up to +16 points, reflecting better fluency and coverage
pytorch_model.bin β fine-tuned model weightsconfig.json β model architecturetokenizer_config.json, vocab.json, source.spm, target.spm β tokenizergeneration_config.json β default decoding setupApache 2.0 (Same as original model and Samanantar dataset)
Base model
Helsinki-NLP/opus-mt-en-hi