Yume-1.5: A Text-Controlled Interactive World Generation Model Paper β’ 2512.22096 β’ Published 8 days ago β’ 55
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper β’ 2511.01833 β’ Published Nov 3, 2025 β’ 15
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper β’ 2509.07969 β’ Published Sep 9, 2025 β’ 58
Symbolic Graphics Programming with Large Language Models Paper β’ 2509.05208 β’ Published Sep 5, 2025 β’ 46
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper β’ 2509.02479 β’ Published Sep 2, 2025 β’ 83
Intern-S1: A Scientific Multimodal Foundation Model Paper β’ 2508.15763 β’ Published Aug 21, 2025 β’ 259
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper β’ 2508.14704 β’ Published Aug 20, 2025 β’ 43
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper β’ 2508.08221 β’ Published Aug 11, 2025 β’ 50
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper β’ 2507.16746 β’ Published Jul 22, 2025 β’ 35
Sekai: A Video Dataset towards World Exploration Paper β’ 2506.15675 β’ Published Jun 18, 2025 β’ 65
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper β’ 2506.10521 β’ Published Jun 12, 2025 β’ 73
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper β’ 2506.13585 β’ Published Jun 16, 2025 β’ 273
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3, 2025 β’ 96
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19, 2025 β’ 178
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper β’ 2505.19253 β’ Published May 25, 2025 β’ 32
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper β’ 2504.11536 β’ Published Apr 15, 2025 β’ 63