Instructions to use naver-clova-ix/donut-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naver-clova-ix/donut-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="naver-clova-ix/donut-base")# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("naver-clova-ix/donut-base") model = AutoModelForImageTextToText.from_pretrained("naver-clova-ix/donut-base") - Notebooks
- Google Colab
- Kaggle
Fine tuning using LoRa
Hello,
I’m new to ML and this is probably a basic problem. I’m trying to fine tune Donut base model using my documents but getting errors.
https://anaconda.com/app/share/notebooks/98670ba2-545f-4554-bc6a-30e277b1d710/overview
The error is
TypeError: DonutSwinModel.forward() got an unexpected keyword argument ‘input_ids’
I’m generating a dataset using document images and annotations.jsonl with following data
{“label”: “{"load_id": "1234", "carrier_name": "Bison"}”, “image”: “TOUR_LOGISTICS_0.png”}
My dataset has
{
“pixel_values”: batch[“pixel_values”],
“decoder_input_ids”: batch[“decoder_input_ids”],
“labels”: batch[“labels”]
}
Isn’t Trainer process knows which field to use for Encoder and Decoder?
I tried downgrading transformers==4.45.2 and it didn’t help.
Hi ,
I was curious if you made any progress with the donut model fine tuning. I recently worked on donut and have some code on my github that might be useful if you're still stuck.