Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper β’ 2412.04424 β’ Published Dec 5, 2024 β’ 62
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models Oct 18, 2024 β’ 20
DocLayout-YOLO Collection Dataset and model for DocLayout-YOLO β’ 10 items β’ Updated Jan 14, 2025 β’ 20
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper β’ 2410.02757 β’ Published Oct 3, 2024 β’ 36
Emu3 Collection Emu3: Next-Token Prediction is All You Need β’ 7 items β’ Updated Feb 13, 2025 β’ 79
Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated 15 days ago β’ 309
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 46 items β’ Updated 7 days ago β’ 673
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper β’ 2409.12191 β’ Published Sep 18, 2024 β’ 78
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper β’ 2409.01704 β’ Published Sep 3, 2024 β’ 83
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 80
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 16 items β’ Updated 7 days ago β’ 227
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper β’ 2408.02442 β’ Published Aug 5, 2024 β’ 21
Papers I want to read Collection Papers in my to-read list β’ 259 items β’ Updated Jan 10, 2025 β’ 32
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 β’ 122