AI Power Progress iA
All Resources / Topics / Topic / RedPajama
Resource detail

RedPajama

Mixed-source replication corpus including CommonCrawl, C4, GitHub, arXiv, Wikipedia, and StackExchange.

dataset hugging-face redpajama repo warning warnings

Resource Metadata

Category

Caution

Provider

Hugging Face

Type

dataset

Level

unknown

Topic

Local AI / LLM Engineering / RAG

Track

Local AI / LLM Engineering / RAG

Section

Warning

Format

Reference

Status

publishable

Commercial

avoid-or-link-only

Featured

no

Fast start

no

Sequence

nan

Priority

Warning

Primary source

direct_links_master

Sources

direct_links_master, mega_open_hub, training_data_stack

ID

6bf6e3b4f4ec0bf7

Open Resource

Fallback Access