AI Power Progress iA
All Resources / Topics / RedPajama-Data-V2
Resource detail

RedPajama-Data-V2

Open LLM dataset built from Common Crawl snapshots with quality signals and dedup metadata.

dataset

Resource Metadata

Category

base_weights

Provider

huggingface.co

Type

dataset

Level

unknown

Topic

general

Track

n/a

Section

n/a

Format

n/a

Status

publishable

Commercial

unknown

Featured

no

Fast start

no

Sequence

n/a

Priority

n/a

Primary source

training_data_stack

Sources

training_data_stack

ID

e0b790262942e461

Open Resource

Fallback Access