OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection Paper • 2306.09301 • Published Jun 15, 2023 • 1
Scattered Forest Search: Smarter Code Space Exploration with LLMs Paper • 2411.05010 • Published Oct 22, 2024 • 1
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? Paper • 2504.11741 • Published Apr 16, 2025 • 1
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Paper • 2506.18880 • Published Jun 23, 2025 • 4
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought Paper • 2510.24941 • Published Oct 28, 2025 • 4
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents Paper • 2602.13379 • Published Feb 13 • 3
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment Paper • 2303.13662 • Published Mar 23, 2023
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation Paper • 2502.10341 • Published Feb 14, 2025 • 3
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models Paper • 2502.18443 • Published Feb 25, 2025 • 12
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15, 2025 • 20
Teaching Models to Understand (but not Generate) High-risk Data Paper • 2505.03052 • Published May 5, 2025 • 6
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 61
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 63