Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 1 day ago • 111
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 5 days ago • 130
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 6 days ago • 53
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 9 days ago • 123
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published 13 days ago • 53
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 20 days ago • 126
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 25 days ago • 51
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published 26 days ago • 36
Agentic Rubrics as Contextual Verifiers for SWE Agents Paper • 2601.04171 • Published 27 days ago • 11
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 97
Scaling Laws for Code: Every Programming Language Matters Paper • 2512.13472 • Published Dec 15, 2025 • 13
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published Dec 22, 2025 • 17
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 154
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 71