AI Power Progress iA
All Resources / Topics / Topic / DCLM-baseline
Resource detail

DCLM-baseline

Benchmark-oriented high-quality Common Crawl derivative used in DCLM.

base-pretraining cpt-benchmarking dataset hugging-face llm-engineering local-ai rag text training-data

Resource Metadata

Category

Base pretraining

Provider

Hugging Face

Type

dataset

Level

unknown

Topic

Local AI / LLM Engineering / RAG

Track

Local AI / LLM Engineering / RAG

Section

Open data

Format

Dataset

Status

manual_review

Commercial

manual-review

Featured

no

Fast start

no

Sequence

nan

Priority

B

Primary source

direct_links_master

Sources

direct_links_master, mega_open_hub

ID

433e4c7c5f8ee420

Open Resource

Fallback Access