MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18, 2025 • 20
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published Oct 8, 2025 • 39
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14, 2025 • 112
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models Paper • 2512.04981 • Published about 1 month ago • 7
Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning Paper • 2512.06835 • Published 28 days ago • 3