All Resources / Topics / Topic / LAION-5B / Re-LAION-5B

Resource detail

LAION-5B / Re-LAION-5B

Massive web-scale image-text corpus.

audio computer-vision dataset image-text-urls laion-ai multimodal training-data vision-language vlm-pretraining-at-scale

Resource Metadata

Provider

laion.ai

Type

dataset

Level

unknown

Topic

Computer Vision / Multimodal / Audio

Track

Computer Vision / Multimodal / Audio

Section

Open data

Format

Dataset

Status

manual_review

Commercial

manual-review

Featured

Fast start

Sequence

nan

Priority

Primary source

direct_links_master

Sources

direct_links_master, mega_open_hub

ID

d3ce474ca79cb7e0

Open Resource

Fallback Access

https://web.archive.org/web/*/https://laion.ai/

Continue Learning

Keep momentum with nearby resources and structured tracks.

Learning placement: track: Computer Vision / Multimodal / Audio · stage: nan

Tags: audio computer-vision dataset image-text-urls laion-ai multimodal training-data vision-language vlm-pretraining-at-scale

More in this topic More by provider More of this type Learning Hub Start Here

Related Resources

Similar items by topic, tags, and provider (metadata-only).

datasetLAION

LAION-5B

LAION

Web-scale image-text corpus for multimodal research; use with strong filtering and license review.

Open Source

datasetnanstorage.googleapis.com

Open Images V7

storage.googleapis.com

Large supervised vision dataset with labels, boxes, masks, relations, narratives.

Open Source

datasetnancommonvoice.mozilla.org

Mozilla Common Voice

commonvoice.mozilla.org

Best open multilingual voice dataset.

Open Source

datasetnanopenslr.org

LibriSpeech

openslr.org

Classic open English speech corpus.

Open Source

datasetBuildMicrosoft / Google

COCO + Open Images + WIT dataset bundle

Microsoft / Google

High-value public corpora for detection, captions, grounding, and multimodal experimentation.

Open Source

datasetnanopenslr.org

LibriTTS

openslr.org

Open TTS-oriented speech corpus.

Open Source

datasetnanimage-net.org

ImageNet

image-net.org

Still useful for vision classification baselines.

Open Source

datasetBuildMozilla / OpenSLR

Mozilla Common Voice + LibriSpeech bundle

Mozilla / OpenSLR

Best public speech base for ASR experimentation and evaluation.

Open Source

datasetMozilla

Common Voice

Mozilla

Large multilingual speech dataset project for ASR, speech research, and voice tooling.

Open Source

reponanGitHub

WIT

GitHub

Excellent multilingual image-text corpus from Wikipedia/Wikimedia.

Open Source

docsFoundationOpenCV

OpenCV Tutorials

OpenCV

Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.

Open Source

videonanOpenCV

OpenCV

Video supplement for image processing and computer vision workflows.

Open Source

LAION-5B / Re-LAION-5B

Resource Metadata

Category

Provider

Type

Level

Topic

Track

Section

Format

Status

Commercial

Featured

Fast start

Sequence

Priority

Primary source

Sources

ID

Fallback Access

Continue Learning

Related Resources