1 18 5

张康宁

zhuiguang-ning

AI & ML interests

None yet

Recent Activity

upvoted a paper about 7 hours ago

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

upvoted a paper 7 days ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

upvoted a paper 7 days ago

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

View all activity

Organizations

None yet

upvoted a paper about 7 hours ago

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 13 days ago • 282

upvoted 2 papers 7 days ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 338

Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

Paper • 2604.09574 • Published Feb 24 • 30

upvoted a paper 21 days ago

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Paper • 2603.27813 • Published 23 days ago • 23

upvoted 2 papers about 2 months ago

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Paper • 2602.08354 • Published Feb 9 • 264

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Paper • 2602.08222 • Published Feb 9 • 290

upvoted 2 papers 3 months ago

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230

upvoted a paper 4 months ago

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104

upvoted a collection 4 months ago

Qwen3-Coder

Collection

5 items • Updated Dec 31, 2025 • 175

upvoted 2 papers 4 months ago

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Paper • 2309.01940 • Published Sep 5, 2023 • 2

LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls

Paper • 2511.09148 • Published Nov 12, 2025 • 18

upvoted a collection 5 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 699

upvoted a paper 5 months ago

3D Diffusion Policy

Paper • 2403.03954 • Published Mar 6, 2024 • 13

upvoted 4 papers 6 months ago

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1, 2025 • 81

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27, 2025 • 86

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

张康宁

AI & ML interests

Recent Activity

Organizations

zhuiguang-ning's activity