Similar items by topic, tags, and provider (metadata-only).
datasetBuildMicrosoft / Google
Microsoft / Google
High-value public corpora for detection, captions, grounding, and multimodal experimentation.
repoBuildOpenAI
OpenAI
Great practical base for local transcription pipelines and speech experiments.
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetnancommonvoice.mozilla.org
commonvoice.mozilla.org
Best open multilingual voice dataset.
datasetnanopenslr.org
openslr.org
Classic open English speech corpus.
datasetnanlaion.ai
laion.ai
Massive web-scale image-text corpus.
datasetnanopenslr.org
openslr.org
Open TTS-oriented speech corpus.
datasetnanimage-net.org
image-net.org
Still useful for vision classification baselines.
datasetBuildMozilla / OpenSLR
Mozilla / OpenSLR
Best public speech base for ASR experimentation and evaluation.
docsFoundationOpenCV
OpenCV
Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.
videonanOpenCV
OpenCV
Video supplement for image processing and computer vision workflows.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.