Category
base_weights
Open LLM dataset built from Common Crawl snapshots with quality signals and dedup metadata.
base_weights
huggingface.co
dataset
unknown
general
n/a
n/a
n/a
publishable
unknown
no
no
n/a
n/a
training_data_stack
training_data_stack
e0b790262942e461