OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 17 days ago • 57
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20, 2025 • 47
A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction Paper • 2403.09721 • Published Mar 12, 2024 • 1
Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning Paper • 2507.21892 • Published Jul 29, 2025 • 3
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding Paper • 2505.19076 • Published May 25, 2025
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 72
$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper • 2601.09274 • Published Jan 14 • 84
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published Jan 12 • 28
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 24 days ago • 12
TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents Paper • 2602.02196 • Published 20 days ago • 33
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text Paper • 2301.02983 • Published Jan 8, 2023
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 17 days ago • 57
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26, 2025 • 104
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11, 2025 • 55
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21, 2025 • 54
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper • 2501.18119 • Published Jan 30, 2025 • 25
$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17, 2025 • 51
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16, 2025 • 27
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 87