Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lightonai/LightOnOCR-2-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForSeq2SeqLM

processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B")
model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lightonai/LightOnOCR-2-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lightonai/LightOnOCR-2-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/lightonai/LightOnOCR-2-1B

SGLang

How to use lightonai/LightOnOCR-2-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lightonai/LightOnOCR-2-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lightonai/LightOnOCR-2-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lightonai/LightOnOCR-2-1B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
```
docker model run hf.co/lightonai/LightOnOCR-2-1B
```

Add overall OlmOCRBench results

#31

by nielsr HF Staff - opened Feb 19

base: refs/heads/main

←

from: refs/pr/31

Discussion Files changed

-6

Add overall OlmOCRBench resultsaaebe21c

nielsr

Feb 19

OlmOCRBench was recently updated to display "Overall" results by default, this PR ensures your model shows its score on the leaderboard.

It will show up here: https://huggingface.co/datasets/allenai/olmOCR-bench.

staghado

LightOn AI org Feb 20

Hi Niels,
Thanks for adding this!
There is one important distinction for the "Overall" results: since the "Headers & Footers" category rewards ignoring/not outputting visible text, we choose to exclude it from the Overall average, in fact, in the RLVR setup we try to minimize the H&F score so that the model does full page transcription, including page headers, footers and page numbers.
I think we should have an Overall without this metric since it's a bit misleading from first sight.

Add notesb5aae083

nielsr

Feb 20

Ok, thanks for clarifying. Note that the evaluation feature includes a "notes" field, where you can specify additional information. Have updated this PR to reflect that.

For now I'd use the "notes", I'll discuss with AllenAI to potentially create a separate leaderboard/task for it.

Btw, would you be up for helping us add GLM-OCR to the leaderboard as well? Happy to set up a Slack channel with you

staghado

LightOn AI org Feb 20

Great!
Happy to help for benching GLM-OCR, I have been willing to do so, just didn't have time before.

staghado changed pull request status to merged Feb 20

staghado

LightOn AI org Feb 20

@nielsr
I tried benchmarking GLM-OCR on OlmoOCR-bench today. It proved quite challenging : GLM-OCR is a two-stage pipeline (layout analysis + region recognition) rather than an end-to-end model. There are no official standalone inference scripts; the intended workflow relies on their SDK which integrates PP-DocLayoutV3 for layout detection and routes each region to the appropriate task prompt (text, formula, or table).

As a first pass, I ran the model directly with just the "Text Recognition:" prompt on all images using this script as reference for vLLM inference. Here are the results:

Category	Score
headers_footers	92.3%
long_tiny_text	87.6%
arxiv_math	80.4%
multi_column	79.9%
old_scans_math	74.9%
table_tests	42.5%
old_scans	39.9%
Overall	71.1% +-1.1
Overall (wo h/f)	67.5%

For better results, we will need to include the layout detector but since they don't provide it in a standalone model it's kind of a hassle to use.

nielsr

Feb 20

Ok, would it be easier to just use their API? I.e. :

from zai import ZaiClient

# Initialize client
client = ZaiClient(api_key="your-api-key")

image_url = "https://cdn.bigmodel.cn/static/logo/introduction.png"

# Call layout parsing API
response = client.layout_parsing.create(
    model="glm-ocr",
    file=image_url
)

# Output result
print(response)

staghado

LightOn AI org Feb 21

yeah it might be best to just use the API directly, I do not have a GLM API key though.

nielsr

Feb 21

We'd be happy to give you one. Is there any email address I can reach you on to invite you to Slack? Couldn't find it on your X/Github profiles

staghado

LightOn AI org Feb 21

shared via dm on twitter/x.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment