TWIN
Collection
Datasets and models from the paper "Same or Not? Enhancing Visual Perception in Vision-Language Models"
•
4 items
•
Updated
•
1
This is the Qwen2.5-VL-3B-Instruct model post-trained on the TWIN dataset from the paper: Same or Not? Enhancing Visual Perception in Vision-Language Models
For further information please refer to the project webpage, paper, and repository.
If you use TWIN in your research, please consider citing our work:
BibTeX:
@misc{marsili2025notenhancingvisualperception,
title={Same or Not? Enhancing Visual Perception in Vision-Language Models},
author={Damiano Marsili and Aditya Mehta and Ryan Y. Lin and Georgia Gkioxari},
year={2025},
eprint={2512.23592},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.23592},
}
The dataset is derived from the UCSD Amazon Reviews’23 dataset. Use is permitted for research and educational purposes only. By using this dataset, you agree to respect the rights of original content owners and comply with applicable terms of service.