Similar items by topic, tags, and provider (metadata-only).
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetBuildMozilla / OpenSLR
Mozilla / OpenSLR
Best public speech base for ASR experimentation and evaluation.
reponanGitHub
GitHub
Excellent multilingual image-text corpus from Wikipedia/Wikimedia.
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetnanimage-net.org
image-net.org
Still useful for vision classification baselines.
datasetnanopenslr.org
openslr.org
Open TTS-oriented speech corpus.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.
datasetLAION
LAION
Web-scale image-text corpus for multimodal research; use with strong filtering and license review.
repoBuildOpenAI
OpenAI
Great practical base for local transcription pipelines and speech experiments.