EAGLE3-Qwen2.5-VL-7B-Instruct (Benchmark Release)

This model repo is part of a multimodal speculative decoding benchmark suite.

Why this repo exists

We maintain a unified benchmark codebase that includes multiple methods (Baseline, EAGLE, EAGLE2, EAGLE3, Lookahead, MSD, ViSpec) so users can run training/evaluation more easily under one setup.

The methods are aggregated here for user convenience (shared dataset format, scripts, and metrics).
The original ideas and implementations belong to their respective authors.
This specific Hugging Face repo hosts the EAGLE3-Qwen2.5-VL-7B-Instruct checkpoint used in our benchmark runs.

Upstream / Base Model

Base model: Qwen/Qwen2.5-VL-7B-Instruct

What is in this repo

config.json
pytorch_model.bin

This checkpoint is intended to be loaded as the EAGLE3 draft/speculative model together with the base model above.

Example usage (benchmark codebase)

bash scripts/Qwen/eval_eagle3_mmspec.sh testmini Cloudriver/EAGLE3-Qwen2.5-VL-7B-Instruct

Method references

EAGLE: https://arxiv.org/abs/2401.15077
EAGLE-2: https://arxiv.org/abs/2406.16858
EAGLE-3: https://arxiv.org/abs/2503.01840
ViSpec: https://arxiv.org/abs/2509.15235
Lookahead Decoding: https://lmsys.org/blog/2023-11-21-lookahead-decoding/
Medusa: https://github.com/FasterDecoding/Medusa

Citation

If you use this checkpoint and benchmark, please cite EAGLE3 and the baseline methods you compare against.

EAGLE / EAGLE2 / EAGLE3

@inproceedings{li2024eagle,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE}: Speculative Sampling Requires Rethinking Feature Uncertainty},
  booktitle = {International Conference on Machine Learning},
  year = {2024}
}

@inproceedings{li2024eagle2,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-2}: Faster Inference of Language Models with Dynamic Draft Trees},
  booktitle = {Empirical Methods in Natural Language Processing},
  year = {2024}
}

@inproceedings{li2025eagle3,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-3}: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
  booktitle = {Annual Conference on Neural Information Processing Systems},
  year = {2025}
}