SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 6 days ago • 35
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 6 days ago • 43
Revisiting the Platonic Representation Hypothesis: An Aristotelian View Paper • 2602.14486 • Published 3 days ago • 9
ResearchGym: Evaluating Language Model Agents on Real-World AI Research Paper • 2602.15112 • Published 3 days ago • 16
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published 4 days ago • 24
VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval Paper • 2602.08099 • Published 11 days ago • 120
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 4 days ago • 37
UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model Paper • 2602.14178 • Published 4 days ago • 11
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper • 2602.13191 • Published 6 days ago • 29
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution Paper • 2602.12684 • Published 6 days ago • 5
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 9 days ago • 215
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs Paper • 2602.12705 • Published 6 days ago • 57
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published 7 days ago • 14
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 10 days ago • 45