TrOCR Base Fine-Tuned for Czech Historical Vital Records

This is a fine-tuned TrOCR-Base model (microsoft/trocr-base-handwritten) specializing in Handwritten Text Recognition of 19th-century Czech vital records (birth, marriage, death registers).

The model was trained as part of the Master's Thesis: "Automated Transcription and Search in Historical Records Using Handwritten Text Recognition".

It was developed on an original, manually annotated dataset of historical Czech scripts and is designed to be used inside the full historical document processing pipeline (layout analysis → text detection → recognition → post-processing).

For detailed performance metrics, evaluation, and the full pipeline description, please refer to the thesis text.

Citation

@misc{palkovic2025htr,
      AUTHOR = {Palkovič, Radoslav},
      TITLE = {Automated Transcription and Search in Historical Records Using Handwritten Text Recognition},
      YEAR = {2025},
      TYPE = {Master Thesis},
      INSTITUTION = {Masaryk University, Faculty of Informatics},
      LOCATION = {Brno},
      SUPERVISOR = {Michal Batko}
}
Downloads last month
4
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support