MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 14 days ago • 183
In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 21 days ago • 42
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 79
Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling Paper • 2602.02453 • Published Feb 2 • 36
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published Jan 16 • 9
PEAR: Phase Entropy Aware Reward for Efficient Reasoning Paper • 2510.08026 • Published Oct 9, 2025 • 9
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 136
On the Multi-turn Instruction Following for Conversational Web Agents Paper • 2402.15057 • Published Feb 23, 2024 • 1
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia Paper • 2502.06298 • Published Feb 10, 2025 • 1