bluelightai-dev/clt-eval-modernbert-tokenized
Viewer
• Updated
• 328k • 35
bluelightai-dev/clt-train-modernbert-tokenized
Viewer
• Updated
• 1.94M • 38
bluelightai-dev/clt-pretrain-data-v3-eval-tokenized-Qwen3-256
Viewer
• Updated
• 212k • 39
bluelightai-dev/clt-pretrain-data-v3-tokenized-Qwen3-max-1024
Viewer
• Updated
• 4.04M • 13
bluelightai-dev/clt-pretrain-data-v3-tokenized-qwen3
Viewer
• Updated
• 1.81M • 264
bluelightai-dev/clt-pretrain-data-v3
Viewer
• Updated
• 2.99M • 9
bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample
Viewer
• Updated
• 6.32M • 7
bluelightai-dev/dolma3_mix-150B-1025-sample
Viewer
• Updated
• 4.97M • 37
bluelightai-dev/clt-mixed-eval-data-tokenized-Qwen3
Viewer
• Updated
• 115k • 31
bluelightai-dev/clt-mixed-eval-data
Viewer
• Updated
• 60k • 24
bluelightai-dev/clt-mixed-data-tokenized-Qwen3
Viewer
• Updated
• 2.6M • 43
bluelightai-dev/clt-pretrain-eval-data-tokenized-Qwen3-256
Viewer
• Updated
• 194k • 36
bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024
Viewer
• Updated
• 2.52M • 52
bluelightai-dev/clt-pretrain-data-v2-dedup
Preview
• Updated
• 32
bluelightai-dev/clt-pretrain-data-tokenized-Qwen3-1024
Viewer
• Updated
• 2.44M • 66
bluelightai-dev/clt-pretrain-data-v2
Preview
• Updated
• 67
bluelightai-dev/MathPile_Commercial-formatted
Viewer
• Updated
• 389k • 49
bluelightai-dev/clt_posttrain_data_tokenized
Viewer
• Updated
• 1.34M • 55
bluelightai-dev/common-corpus-sample-open-web
Viewer
• Updated
• 4.8M • 36
bluelightai-dev/common-corpus-sample-open-source
Viewer
• Updated
• 2.02M • 26
bluelightai-dev/common-corpus-sample-open-science
Viewer
• Updated
• 284k • 28
bluelightai-dev/common-corpus-sample-open-government
Viewer
• Updated
• 373k • 28
• 1
bluelightai-dev/common-corpus-sample-open-culture
Viewer
• Updated
• 462k • 45
bluelightai-dev/clt_posttrain_data_tokenized_test_1000
Viewer
• Updated
• 1.22k • 9
bluelightai-dev/dclm-full-deduped-sample
Viewer
• Updated
• 4.92M • 55
bluelightai-dev/the-stack-dedup-sample
Viewer
• Updated
• 474k • 31
bluelightai-dev/pythia_clt_pretrain_data_tokenized
Viewer
• Updated
• 3.5M • 74
bluelightai-dev/clt_eval_data_qwen3_tokenized_256
Viewer
• Updated
• 245k • 85
bluelightai-dev/clt_pretrain_data_qwen_tokenized
Viewer
• Updated
• 16.7M • 155
bluelightai-dev/clt_posttrain_data_qwen_tokenized
Viewer
• Updated
• 1.34M • 91