Instructions to use vrajdetrojapes/chartqa-qwen2vl-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use vrajdetrojapes/chartqa-qwen2vl-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") model = PeftModel.from_pretrained(base_model, "vrajdetrojapes/chartqa-qwen2vl-lora") - Transformers
How to use vrajdetrojapes/chartqa-qwen2vl-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="vrajdetrojapes/chartqa-qwen2vl-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vrajdetrojapes/chartqa-qwen2vl-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use vrajdetrojapes/chartqa-qwen2vl-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vrajdetrojapes/chartqa-qwen2vl-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vrajdetrojapes/chartqa-qwen2vl-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/vrajdetrojapes/chartqa-qwen2vl-lora
- SGLang
How to use vrajdetrojapes/chartqa-qwen2vl-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vrajdetrojapes/chartqa-qwen2vl-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vrajdetrojapes/chartqa-qwen2vl-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vrajdetrojapes/chartqa-qwen2vl-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vrajdetrojapes/chartqa-qwen2vl-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use vrajdetrojapes/chartqa-qwen2vl-lora with Docker Model Runner:
docker model run hf.co/vrajdetrojapes/chartqa-qwen2vl-lora
ChartQA Multimodal Fine-Tuning using Qwen2-VL
This repository contains LoRA adapters fine-tuned for visual question answering on chart images using the ChartQA dataset.
The adapters were trained on top of the base model:
Qwen/Qwen2-VL-2B-Instruct
using parameter-efficient fine-tuning techniques.
Model Details
Model Name: chartqa-qwen2vl-lora
Developed by: Vraj Detroja
Model Type: Vision-Language Model (Multimodal Transformer)
Base Model: Qwen/Qwen2-VL-2B-Instruct
Fine-Tuning Method: LoRA (Low Rank Adaptation)
Training Platform: Kaggle
GPU: Tesla T4 (16GB VRAM)
Frameworks Used
- PyTorch
- Hugging Face Transformers
- PEFT
- BitsAndBytes
- Accelerate
Model Description
This model is a LoRA fine-tuned adapter for the Qwen2-VL vision-language model trained on the ChartQA dataset.
The model learns to answer questions about chart images.
Example task:
Input:
Image + Question
Is the value of Favorable 38 in 2015?
Output:
Yes
The model processes both visual and textual information to generate answers.
Dataset
Training dataset:
ChartQA
Dataset link:
https://huggingface.co/datasets/HuggingFaceM4/ChartQA
ChartQA is a visual question answering dataset for chart understanding.
Dataset structure:
| Field | Description |
|---|---|
| image | Chart image |
| query | Question about chart |
| label | Ground truth answer |
| human_or_machine | Annotation type |
Dataset splits:
| Split | Samples |
|---|---|
| Train | 28,299 |
| Validation | 1,920 |
| Test | 2,500 |
For this project, 1000 samples from the training set were used for fine-tuning.
Training Details
Fine-Tuning Method
The model was fine-tuned using LoRA (Low Rank Adaptation).
Instead of training the entire model, LoRA trains small adapter layers inserted into the transformer architecture.
Training statistics:
| Metric | Value |
|---|---|
| Total parameters | ~2.21B |
| Trainable parameters | ~2.17M |
| Trainable percentage | ~0.1% |
This significantly reduces GPU memory usage.
Quantization
The model was loaded using 4-bit quantization via BitsAndBytes.
Configuration:
load_in_4bit = True
bnb_4bit_quant_type = "nf4"
bnb_4bit_compute_dtype = float16
Benefits:
- Reduced VRAM usage
- Faster training
- Enables training on T4 GPU
Training Configuration
Hyperparameters used:
Batch size: 1
Gradient accumulation: 4
Learning rate: 2e-4
Epochs: 1
Training samples: 1000
Precision: FP16
Gradient checkpointing was enabled to reduce memory consumption.
Training results:
| Metric | Value |
|---|---|
| Training steps | 250 |
| Final training loss | ~7.59 |
Hardware Used
| Component | Value |
|---|---|
| GPU | Tesla T4 |
| VRAM | 16 GB |
| Platform | Kaggle |
| Framework | PyTorch |
Training time:
~9 minutes for 250 training steps.
How to Use the Model
This repository contains LoRA adapters, not the full model.
You must load the base model first.
Example:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base_model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-2B-Instruct",
device_map="auto"
)
model = PeftModel.from_pretrained(
base_model,
"vrajdetrojapes/chartqa-qwen2vl-lora"
)
processor = AutoProcessor.from_pretrained(
"Qwen/Qwen2-VL-2B-Instruct"
)
Example Inference
from PIL import Image
image = Image.open("sample_chart.png")
question = "Is the value of Favorable 38 in 2015?"
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": question}
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt"
).to(model.device)
output = model.generate(
**inputs,
max_new_tokens=50
)
print(processor.decode(output[0]))
Intended Use
The model is intended for:
- Chart question answering
- Multimodal reasoning research
- Vision-language experimentation
- Educational purposes
Limitations
The model has several limitations:
- Trained on a small subset of the dataset
- May struggle with complex chart reasoning
- Limited generalization beyond chart datasets
- Not suitable for production systems
Ethical Considerations
Users should be aware that:
- The model may generate incorrect answers.
- Chart interpretation errors are possible.
- Outputs should be validated for critical applications.
Citation
@misc{chartqa_qwen2vl_lora,
author = {Detroja, Vraj},
title = {ChartQA Multimodal Fine-Tuning using Qwen2-VL},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/vrajdetrojapes/chartqa-qwen2vl-lora}
}
Author
Vraj Detroja
Natural Language Processing with Deep Learning
Multimodal Fine-Tuning Project
- Downloads last month
- -