Similar items by topic, tags, and provider (metadata-only).
datasetarchive.org
archive.org
Access: open. Q&A dumps
datasetlib.ncsu.edu
lib.ncsu.edu
Access: open. arXiv metadata/full text access options
repogithub.com
github.com
Open web text
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetnandumps.wikimedia.org
dumps.wikimedia.org
Strong encyclopedic backbone for general knowledge and factual style.
datasetBuildphysionet.org
physionet.org
Canonical source for ECG, ICU, waveform, and related biomedical datasets.
datasetBuildopenneuro.org
openneuro.org
Best open hub for MRI/EEG/MEG/iEEG style data.
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
datasetUCI
UCI
Classic and modern ML datasets that are ideal for education, benchmarking, and tabular experiments.
datasetnanHugging Face
Hugging Face
Top open code corpus; huge language coverage.
datasetnanstorage.googleapis.com
storage.googleapis.com
Large supervised vision dataset with labels, boxes, masks, relations, narratives.
datasetNASA
NASA
NASA metadata portal for open datasets spanning science, aeronautics, Earth, and exploration.