jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated 13 days ago • 35
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 14 days ago • 87
KoViDoRe Benchmark (BEIR) v2 Collection Korean Vision Document Retrieval Benchmark • 4 items • Updated 10 days ago • 5
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries Dec 22, 2025 • 9
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published Aug 7, 2025 • 47
HyperCLOVA X SEED Collection HyperCLOVA X SEED is NAVER's lightweight open-source lineup with a strong focus on Korean language performance • 6 items • Updated Dec 24, 2025 • 41
EXAONE-Deep Collection EXAONE reasoning model series of 2.4B, 7.8B, and 32B, optimized for reasoning tasks including math and coding • 10 items • Updated Jul 7, 2025 • 96
Magpie-Llama3.1 Datasets Collection Dataset built with Meta Llama 3.1 70B. • 6 items • Updated Jan 13, 2025 • 4