AI Power Progress iA
All Resources / Topics / LAION-400M
Resource detail

LAION-400M

multimodal_pretraining

dataset

Resource Metadata

Category

multimodal

Provider

huggingface.co

Type

dataset

Level

unknown

Topic

general

Track

n/a

Section

n/a

Format

n/a

Status

publishable

Commercial

conditional

Featured

no

Fast start

no

Sequence

n/a

Priority

n/a

Primary source

training_data_stack

Sources

training_data_stack

ID

772d0260d9bf1bc8

Open Resource

Fallback Access

Continue Learning

Keep momentum with nearby resources and structured tracks.

Tags: dataset

Related Resources

Similar items by topic, tags, and provider (metadata-only).

datasethuggingface.co

TriviaQA

huggingface.co

Access: open (check license). Trivia QA dataset for retrieval + reading comprehension

datasethuggingface.co

The Stack

huggingface.co

Review dataset terms/acknowledgments and filter by license for your use case.

datasethuggingface.co

SQuAD

huggingface.co

Access: open (check license). Reading comprehension QA dataset

datasethuggingface.co

RedPajama-Data-V2

huggingface.co

Open LLM dataset built from Common Crawl snapshots with quality signals and dedup metadata.

datasethuggingface.co

Natural Questions

huggingface.co

Access: open (check license). Open-domain QA dataset (long + short answers)

datasethuggingface.co

MATH

huggingface.co

Access: open. Advanced math

datasethuggingface.co

HowTo100M

huggingface.co

Access: research-only. Video-text dataset

datasethuggingface.co

HotpotQA

huggingface.co

Access: open (check license). Multi-hop QA for RAG evaluation