PhyCritic: Multimodal Critic Models for Physical AI Paper • 2602.11124 • Published 13 days ago • 51
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published 16 days ago • 5
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published 16 days ago • 67
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published 21 days ago • 76
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published 22 days ago • 33
Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation Paper • 2602.03619 • Published 21 days ago • 26
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 21 days ago • 26
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published 21 days ago • 26
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 26 days ago • 35