AI Power Progress iA
Resource detail

Dolma

Large open corpus spanning web, academic works, code, books, and encyclopedic material.

course dataset

Resource Metadata

Category

base_weights

Provider

allenai.org

Type

dataset

Level

unknown

Topic

general

Track

n/a

Section

n/a

Format

n/a

Status

publishable

Commercial

unknown

Featured

no

Fast start

no

Sequence

n/a

Priority

n/a

Primary source

training_data_stack

Sources

training_data_stack

ID

26f116e3ff39b6b3

Open Resource

Fallback Access

Continue Learning

Keep momentum with nearby resources and structured tracks.

Tags: course dataset

Related Resources

Similar items by topic, tags, and provider (metadata-only).

datasetnanHugging Face

Cosmopedia

Hugging Face

Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.

datasetnanHugging Face

Common Pile v0.1

Hugging Face

Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.

datasetfoundationopenstax.org

OpenStax

openstax.org

Open textbooks across core STEM and humanities subjects.

datasetnptel.ac.in

NPTEL

nptel.ac.in

Access: open. Free engineering courses including EE