Model description

Model Name: multicentury-htr-model-small

Model Version: 202509_small

Model Type: Transformer-based OCR (TrOCR)

Base Model: microsoft/trocr-small-handwritten

Purpose: Handwritten text recognition

Languages: Swedish, Finnish

License: Apache 2.0

This model is a fine-tuned version of the microsoft/trocr-small-handwritten model, specialized for recognizing handwritten text. It has been trained on various dataset from 16th to 20th centuries and can be used for applications such as document digitization, form recognition, or any task involving handwritten text extraction.

Model Architecture

The model is based on a Transformer architecture (TrOCR) with an encoder-decoder setup:

The encoder processes images of handwritten text.
The decoder generates corresponding text output.

Intended Use

This model is designed for handwritten text recognition and is intended for use in:

Document digitization (e.g., archival work, historical manuscripts)
Handwritten notes transcription

Training data

The training dataset includes more than 913 000 samples of handwritten and typewritten text rows, covering a wide variety of handwriting styles and text samples.

Evaluation

The model was evaluated on test dataset. Below are key metrics:

Character Error Rate (CER): 4.08

Test Dataset Description: size ~111 800 text rows

Used Hyperparameters

Evaluation strategy: epoch

Train batch size per device: 16

Learning rate: 2.2e-5

Scheduler: polynomial

Optimizer: AdamW

Number of epochs: 14

FP16 mixed precision training: True

Half precision backend: cuda_amp

Input image size: 192 x 1024

How to Use the Model

You can use the model directly with Hugging Face’s pipeline function or by manually loading the processor and model.

from transformers.models.deit.modeling_deit import DeiTPatchEmbeddings, DeiTEmbeddings
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
import torch.nn.functional as F
from PIL import Image
import torch

def apply_deit_custom_size_patches():
    """Apply patches to DeiT model to support custom image sizes"""
    
    def deit_patch_forward(self, pixel_values, interpolate_pos_encoding=None):
        embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
        return embeddings
    
    def deit_embeddings_forward(self, pixel_values, bool_masked_pos=None, interpolate_pos_encoding=None):
        batch_size, num_channels, height, width = pixel_values.shape
        embeddings = self.patch_embeddings(pixel_values, interpolate_pos_encoding)
        
        cls_tokens = self.cls_token.expand(batch_size, -1, -1)
        distillation_tokens = self.distillation_token.expand(batch_size, -1, -1)
        embeddings = torch.cat((cls_tokens, distillation_tokens, embeddings), dim=1)
        
        patch_size = self.patch_embeddings.patch_size[0]
        num_patches_h = height // patch_size
        num_patches_w = width // patch_size
        num_patches = num_patches_h * num_patches_w
        
        pos_embed = self.position_embeddings
        
        if num_patches + 2 != pos_embed.shape[1]:
            special_pos_embed = pos_embed[:, :2, :]
            patch_pos_embed = pos_embed[:, 2:, :]
            
            orig_size = int(patch_pos_embed.shape[1] ** 0.5)
            embed_dim = patch_pos_embed.shape[2]
            
            patch_pos_embed = patch_pos_embed.reshape(1, orig_size, orig_size, embed_dim)
            patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
            patch_pos_embed = F.interpolate(patch_pos_embed,
                                          size=(num_patches_h, num_patches_w),
                                          mode='bicubic',
                                          align_corners=False)
            patch_pos_embed = patch_pos_embed.permute(0, 2, 3, 1).reshape(1, num_patches, embed_dim)
            
            pos_embed = torch.cat([special_pos_embed, patch_pos_embed], dim=1)
        
        embeddings = embeddings + pos_embed
        embeddings = self.dropout(embeddings)
        
        return embeddings
    
    DeiTPatchEmbeddings.forward = deit_patch_forward
    DeiTEmbeddings.forward = deit_embeddings_forward

# Use it at the start of your inference script
apply_deit_custom_size_patches()

# Load model and processor
processor = TrOCRProcessor.from_pretrained("Kansallisarkisto/multicentury-htr-model-small",
                                            use_fast=True,
                                            do_resize=True, 
                                            size={'height': 192,'width': 1024})
     
model = VisionEncoderDecoderModel.from_pretrained("Kansallisarkisto/multicentury-htr-model-small")

# Open an image of handwritten text
image = Image.open("path_to_image.jpg")

# Preprocess and predict
pixel_values = processor(image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Limitations and Biases

The model was trained primarily on handwritten text that uses basic Latin characters (A-Z, a-z) and includes Nordic special characters (å, ä, ö). It has not been trained on non-Latin alphabets, such as Chinese characters, Cyrillic script, or other writing systems like Arabic or Hebrew. The model may not generalize well to any other languages than Finnish, Swedish or English.

Future Work

Potential improvements for this model include:

Expanding training data: Incorporating more diverse handwriting styles and languages.
Optimizing for specific domains: Fine-tuning the model on domain-specific handwriting.

Citation

If you use this model in your work, please cite it as:

@misc{multicentury_htr_model_202509_small,

author = {Kansallisarkisto},

title = {Multicentury HTR Model: Handwritten Text Recognition},

year = {2025},

publisher = {Hugging Face},

howpublished = {\url{https://huggingface.co/Kansallisarkisto/multicentury-htr-model-small/}},

}

Model Card Authors

Author: Kansallisarkisto Contact Information: [email protected], [email protected]

Downloads last month: 5

Model tree for Kansallisarkisto/multicentury-htr-model-small

Base model

microsoft/trocr-small-handwritten

Finetuned

(3)

this model