Similar items by topic, tags, and provider (metadata-only).
datasethuggingface.co
huggingface.co
Access: open (check license). Trivia QA dataset for retrieval + reading comprehension
datasethuggingface.co
huggingface.co
Access: open (check license). Reading comprehension QA dataset
datasethuggingface.co
huggingface.co
Access: open (check license). Open-domain QA dataset (long + short answers)
datasethuggingface.co
huggingface.co
Access: open (check license). Multi-hop QA for RAG evaluation
datasetmicrosoft.github.io
microsoft.github.io
Access: research-only. Large IR/QA dataset
repogithub.com
github.com
Access: open. Scholarly corpus
repogithub.com
github.com
Access: open. IR benchmark datasets
datasetBuildphysionet.org
physionet.org
Canonical source for ECG, ICU, waveform, and related biomedical datasets.
datasetBuildopenneuro.org
openneuro.org
Best open hub for MRI/EEG/MEG/iEEG style data.
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
datasetnandumps.wikimedia.org
dumps.wikimedia.org
Strong encyclopedic backbone for general knowledge and factual style.
datasetnanHugging Face
Hugging Face
Huge cleaned English web corpus; best raw breadth for LLM pretraining.