Similar items by topic, tags, and provider (metadata-only).
datasetlaion.ai
laion.ai
Still an index of URLs/alt-text; reconstructed images have separate rights considerations.
datasetmmmu-benchmark.github.io
mmmu-benchmark.github.io
Access: open. Multimodal reasoning benchmark
datasetdocvqa.org
docvqa.org
Access: open (login). Document VQA datasets
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
datasetBuildphysionet.org
physionet.org
Canonical source for ECG, ICU, waveform, and related biomedical datasets.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetre3data.org
re3data.org
Access: open. Registry of research data repositories
datasetpaperswithcode.com
paperswithcode.com
Access: open. Dataset index across ML tasks
datasetmicrosoft.github.io
microsoft.github.io
Access: research-only. Large IR/QA dataset
datasetmawi.wide.ad.jp
mawi.wide.ad.jp
Access: research-only. Backbone traffic traces
datasetimage-net.org
image-net.org
Access: research-only. Large image classification
datasetfigshare.com
figshare.com
Access: open. General research data repository