Similar items by topic, tags, and provider (metadata-only).
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetBuildMicrosoft / Google
Microsoft / Google
High-value public corpora for detection, captions, grounding, and multimodal experimentation.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
datasetnanimage-net.org
image-net.org
Still useful for vision classification baselines.
datasetBuildMozilla / OpenSLR
Mozilla / OpenSLR
Best public speech base for ASR experimentation and evaluation.
reponanGitHub
GitHub
Excellent multilingual image-text corpus from Wikipedia/Wikimedia.
datasetLAION
LAION
Web-scale image-text corpus for multimodal research; use with strong filtering and license review.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.
videonanOpenCV
OpenCV
Video supplement for image processing and computer vision workflows.