AI Power Progress iA
Resource detail

Dolma

Access: open. Technical/scientific mix

ai dataset open-data

Resource Metadata

Category

AI

Provider

huggingface.co

Type

dataset

Level

unknown

Topic

AI

Track

n/a

Section

Open Data Directory

Format

n/a

Status

publishable

Commercial

unknown

Featured

no

Fast start

no

Sequence

n/a

Priority

n/a

Primary source

website_existing

Sources

website_existing

ID

6466f52275878811

Open Resource

Fallback Access

Continue Learning

Keep momentum with nearby resources and structured tracks.

Tags: ai dataset open-data

Related Resources

Similar items by topic, tags, and provider (metadata-only).

datasetnanHugging Face

FineWeb

Hugging Face

Huge cleaned English web corpus; best raw breadth for LLM pretraining.

datasethuggingface.co

TriviaQA

huggingface.co

Access: open (check license). Trivia QA dataset for retrieval + reading comprehension

datasethuggingface.co

SQuAD

huggingface.co

Access: open (check license). Reading comprehension QA dataset

datasethuggingface.co

Natural Questions

huggingface.co

Access: open (check license). Open-domain QA dataset (long + short answers)

datasethuggingface.co

MATH

huggingface.co

Access: open. Advanced math

datasethuggingface.co

HowTo100M

huggingface.co

Access: research-only. Video-text dataset

datasethuggingface.co

HotpotQA

huggingface.co

Access: open (check license). Multi-hop QA for RAG evaluation