Bengali CRNN OCR β€” Custom EasyOCR Recognition Model

DocReader BD β€” CSC4233 NLP Final Project, AIUB

Results

Model CER ↓ WER ↓ Char Accuracy
Tesseract (baseline) ~0.45 ~0.60 ~55%
EasyOCR default ~0.25 ~0.40 ~75%
BengaliCRNN (ours) 0.0348 0.1020 96.5%

Architecture

ResNet34 (grayscale) + 2Γ— BiLSTM (hidden=256) + CTC loss Vocab: 152 Bengali + English chars | Input: 64Γ—200px

Files

  • bengali_crnn.pth β€” EasyOCR-ready weights (module. prefix)
  • phase1_best.pth β€” clean weights for further training
  • bengali_crnn.py β€” EasyOCR network definition
  • bengali_crnn.yaml β€” EasyOCR config
  • vocab.json β€” character vocabulary
  • config.json β€” model config

Usage

import easyocr
reader = easyocr.Reader(
    lang_list=["bn"],
    recog_network="bengali_crnn",
    model_storage_directory="./bengali_ocr_model",
    user_network_directory="./bengali_ocr_model",
    gpu=True
)
results = reader.readtext("bengali_doc.jpg")
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support