Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
spaces 7
Running
LudoBench
🎲
Multimodal Game Reasoning Benchmark [ICLR 2026]
Running
Answer Convergence Early Stopping
🛑
Demo for EMNLP Paper "Answer Convergence as a Signal..."
Sleeping
FactRBench
🏆
View and analyze long-form factuality leaderboard
Running
3
ExpertLongBench
🚀
Leaderboard for ExpertLongBench
Sleeping
1
ManyICLBench
🚀
Leaderboard for ManyICLBench
Running
MLRC-BENCH
📊
Display model performance rankings
datasets 13
launch/LudoBench
Viewer
• Updated
• 638 • 8
launch/ExpertLongBench
Preview
• Updated
• 515 • 10
launch/thinkprm-1K-verification-cots
Viewer
• Updated
• 1k • 30 • 6
launch/ManyICLBench
Viewer
• Updated
• 66 • 470 • 1
launch/CMV
Viewer
• Updated
• 133 • 42
launch/FactRBench
Viewer
• Updated
• 1.06k • 68 • 1
launch/FactBench
Viewer
• Updated
• 1k • 107 • 3
launch/CLASH
Viewer
• Updated
• 345 • 36 • 4
launch/gov_report
Viewer
• Updated
• 58.4k • 277 • 11
launch/gov_report_qs
Viewer
• Updated
• 7.87k • 58 • 4