SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published 10 days ago • 59
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models Paper • 2503.21380 • Published Mar 27, 2025 • 38