Similar items by topic, tags, and provider (metadata-only).
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetBuildMozilla / OpenSLR
Mozilla / OpenSLR
Best public speech base for ASR experimentation and evaluation.
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetnanopenslr.org
openslr.org
Open TTS-oriented speech corpus.
datasetLAION
LAION
Web-scale image-text corpus for multimodal research; use with strong filtering and license review.
datasetnanimage-net.org
image-net.org
Still useful for vision classification baselines.
datasetBuildMicrosoft / Google
Microsoft / Google
High-value public corpora for detection, captions, grounding, and multimodal experimentation.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
reponanGitHub
GitHub
Excellent multilingual image-text corpus from Wikipedia/Wikimedia.
videoOpenCV
OpenCV
Video supplement for practical computer vision learning and tool usage.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.