From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
upvoted a paper about 10 hours ago
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding upvoted a paper 8 days ago
LongCat-Next: Lexicalizing Modalities as Discrete Tokens