Category
Base pretraining
Benchmark-oriented high-quality Common Crawl derivative used in DCLM.
Base pretraining
Hugging Face
dataset
unknown
Local AI / LLM Engineering / RAG
Local AI / LLM Engineering / RAG
Open data
Dataset
manual_review
manual-review
no
no
nan
B
direct_links_master
direct_links_master, mega_open_hub
433e4c7c5f8ee420