CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Paper • 2511.18659 • Published Nov 24, 2025 • 25
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning Paper • 2203.10244 • Published Mar 19, 2022 • 1
Accelerated Natural Language Processing Collection Materials for the Accelerated Natural Language Processing (ANLP) course at the University of Edinburgh. • 7 items • Updated Mar 2 • 2
MolmoWeb-Data Collection This is the collection of all datasets in MolmoWebMix. • 6 items • Updated 9 days ago • 20
NanoBEIR 🍺with BM25 Rankings Collection NanoBEIR by Zeta Alpha, extended with BM25 scores. Used in the Sentence Transformers CrossEncoderNanoBEIREvaluator prior to ST version 5.2. • 13 items • Updated Dec 10, 2025 • 3
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated Mar 2 • 15
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 16 items • Updated Jan 29 • 14
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Dec 10, 2025 • 22
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 165
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 264
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 50 items • Updated 21 days ago • 677
Details Collection A gated collection of datasets containing evaluation details • 4500 items • Updated Mar 2 • 6
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 37
DistilBERT release Collection Original DistilBERT model, checkpoints obtained from using teacher-student learning from the original BERT checkpoints. • 6 items • Updated Apr 17, 2024 • 40