NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper β’ 2512.12730 β’ Published Dec 14, 2025 β’ 45
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper β’ 2512.13168 β’ Published Dec 15, 2025 β’ 52
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper β’ 2512.12692 β’ Published Dec 14, 2025 β’ 14
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 11 items β’ Updated 4 days ago β’ 96
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 β’ 212