Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 324
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 28 days ago • 109
Grounding and Enhancing Informativeness and Utility in Dataset Distillation Paper • 2601.21296 • Published Jan 29 • 19
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper • 2601.19325 • Published Jan 27 • 81
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Paper • 2601.14171 • Published Jan 20 • 53
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published Jan 21 • 73
Running 3.78k The Ultra-Scale Playbook 🌌 3.78k The ultimate guide to training LLM on large GPU Clusters
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 17
AI for Service: Proactive Assistance with AI Glasses Paper • 2510.14359 • Published Oct 16, 2025 • 78
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 42
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance Paper • 2509.26231 • Published Sep 30, 2025 • 18