--- title: README emoji: 📈 colorFrom: pink colorTo: pink sdk: static pinned: false license: apache-2.0 --- Welcome to the LCO-Embedding project - Scaling Language-centric Omnimodal Representation Learning. ### Highlights: - We introduce **LCO-Embedding**, a language-centric omnimodal representation learning method and the LCO-Embedding model families, setting a new state-of-the-art on MIEB (Massive Image Embedding Benchmark) while supporting audio and videos. - We introduce the **Generation-Representation Scaling Law**, and connect models' generative capabilities and their representation upper bound. - We introduce **SeaDoc**, a challenging visual document retrieval task in Southeast Asian languages, and show that continual generative pretraining before contrastive learning raises the representation upper bound.