Similar items by topic, tags, and provider (metadata-only).
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetBuildMicrosoft / Google
Microsoft / Google
High-value public corpora for detection, captions, grounding, and multimodal experimentation.
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetnanopenslr.org
openslr.org
Open TTS-oriented speech corpus.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
datasetBuildMozilla / OpenSLR
Mozilla / OpenSLR
Best public speech base for ASR experimentation and evaluation.
datasetLAION
LAION
Web-scale image-text corpus for multimodal research; use with strong filtering and license review.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.
reponanGitHub
GitHub
Excellent multilingual image-text corpus from Wikipedia/Wikimedia.
datasetimage-net.org
image-net.org
Access: research-only. Large image classification