Similar items by topic, tags, and provider (metadata-only).
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetBuildMicrosoft / Google
Microsoft / Google
High-value public corpora for detection, captions, grounding, and multimodal experimentation.
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetnanopenslr.org
openslr.org
Open TTS-oriented speech corpus.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
datasetnanimage-net.org
image-net.org
Still useful for vision classification baselines.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.
reponanGitHub
GitHub
Excellent multilingual image-text corpus from Wikipedia/Wikimedia.
repoBuildOpenAI
OpenAI
Great practical base for local transcription pipelines and speech experiments.
resourceFoundationPyImageSearch
PyImageSearch
Project-driven vision learning for detection, OCR, face pipelines, and deployment patterns.