LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 16 days ago • 21
WebWorld: A Large-Scale World Model for Web Agent Training Paper • 2602.14721 • Published Feb 16 • 19
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published Jan 29 • 19
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 26 days ago • 108
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data Paper • 2604.19859 • Published Apr 21 • 53
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments Paper • 2604.14144 • Published Apr 15 • 63
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 481
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 114
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving Paper • 2604.02190 • Published Apr 2 • 29
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published Mar 17 • 110
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 11
view article Article Introducing Storage Buckets on the Hugging Face Hub +10 Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner • Mar 10 • 194
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published Mar 6 • 120
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning Paper • 2603.00889 • Published Mar 1 • 56
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 199
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published Feb 18 • 12
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 113