MSD-Qwen2.5-VL-7B-Instruct (Benchmark Release)

This model repo is part of a multimodal speculative decoding benchmark suite.

Why this repo exists

We maintain a unified benchmark codebase that includes multiple methods (Baseline, EAGLE, EAGLE2, Lookahead, MSD, ViSpec) so users can run training/evaluation more easily under one setup.

The methods are aggregated here for user convenience (shared dataset format, scripts, and metrics).
The original ideas and implementations belong to their respective authors.
This specific Hugging Face repo hosts the MSD-Qwen2.5-VL-7B-Instruct checkpoint used in our benchmark runs.

Upstream / Base Model

Base model: Qwen/Qwen2.5-VL-7B-Instruct
Original MSD Qwen checkpoint: lucylyn/MSD-Qwen2VL-7B-Instruct

What is in this repo

config.json
pytorch_model.bin

This checkpoint is intended to be loaded as the MSD speculative model together with the base model above (not as a standalone complete replacement for base model + processor/tokenizer assets).

Example usage (benchmark codebase)

python -m evaluation.eval_msd_mmspec \
  --base-model-path Qwen/Qwen2.5-VL-7B-Instruct \
  --msd-model-path Cloudriver/MSD-Qwen2.5-VL-7B-Instruct \
  --data-folder dataset/MMSpec/testmini \
  --answer-file results/mmspec_testmini/msd-temperature-0.jsonl \
  --model-id msd-qwen2.5-vl-7b \
  --temperature 0 \
  --use-msd \
  --total-token -1 \
  --depth 5 \
  --top-k 10

Method references

MSD-LLaVA checkpoint: https://huggingface.co/lucylyn/MSD-LLaVA1.5-7B
MSD-Qwen checkpoint: https://huggingface.co/lucylyn/MSD-Qwen2VL-7B-Instruct
ViSpec: https://arxiv.org/abs/2509.15235
Lookahead Decoding: https://lmsys.org/blog/2023-11-21-lookahead-decoding/
Medusa: https://github.com/FasterDecoding/Medusa

Citation

If you use this checkpoint and benchmark, please cite the original MSD method/checkpoint and the baseline methods you compare against.

EAGLE / EAGLE2 / EAGLE3

@inproceedings{li2024eagle,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE}: Speculative Sampling Requires Rethinking Feature Uncertainty},
  booktitle = {International Conference on Machine Learning},
  year = {2024}
}

@inproceedings{li2024eagle2,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-2}: Faster Inference of Language Models with Dynamic Draft Trees},
  booktitle = {Empirical Methods in Natural Language Processing},
  year = {2024}
}

@inproceedings{li2025eagle3,
  author = {Yuhui Li and Fangyun Wei and Chao Zhang and Hongyang Zhang},
  title = {{EAGLE-3}: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
  booktitle = {Annual Conference on Neural Information Processing Systems},
  year = {2025}
}

Notes

This model card focuses on benchmark usage and attribution.
For full benchmark code and scripts, please refer to the benchmark repository used in your experiment setup.