Datasets used in the paper "A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn’t)"
AI & ML interests
Data-Centric ML
Papers
A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
models 5
Harvard-DCML/boomerang-qwen3-2.3B
Text Generation • 3B • Updated • 27k • 3
Harvard-DCML/boomerang-qwen3-4.9B
5B • Updated • 17.9k
Harvard-DCML/boomerang-pythia-3.8B
4B • Updated • 8.75k
Harvard-DCML/boomerang-pythia-1.6B
2B • Updated • 13.9k
Harvard-DCML/boomerang-llama-3.2-1.9B
2B • Updated • 15.4k
datasets 16
Harvard-DCML/tis-subset-datasets-Llama-2-7b-hf
Viewer • Updated • 300k • 48
Harvard-DCML/tis-quantile-datasets-gtr-t5-base
Viewer • Updated • 25k • 17
Harvard-DCML/tis-random-unbalanced
Viewer • Updated • 30k • 29
Harvard-DCML/tis-quantile-datasets-Olmo-3-1025-7B
Viewer • Updated • 50k • 12
Harvard-DCML/tis-quantile-datasets-SmolLM3-3B-Base
Viewer • Updated • 50k • 16
Harvard-DCML/tis-quantile-datasets-Qwen3-4B-Base
Viewer • Updated • 50k • 7
Harvard-DCML/tis-quantile-datasets-Llama-3.2-3B
Viewer • Updated • 50k • 9
Harvard-DCML/tis-quantile-datasets-Llama-2-7b-hf
Viewer • Updated • 50k • 14
Harvard-DCML/tulu-v2-10K-warmup-processed
Viewer • Updated • 10k • 24
Harvard-DCML/tis-subset-datasets-Olmo-3-1025-7B
Viewer • Updated • 300k • 19