Similar items by topic, tags, and provider (metadata-only).
datasetnaninfo.arxiv.org
info.arxiv.org
Open bulk access for research papers across math, physics, CS, etc.
datasetarxiv.org
arxiv.org
retrieval_index
datasetarxiv.org
arxiv.org
Access: open. Computer science papers
datasetarxiv.org
arxiv.org
Access: open. Quantum physics papers
datasetFoundationCERN
CERN
Orientation guide for using CERN Open Data for research and education.
resourceAdvancedOpenAlex / arXiv
OpenAlex / arXiv
Use OpenAlex for metadata and citation graph work and arXiv for current papers and technical preprints.
datasetBuildphysionet.org
physionet.org
Canonical source for ECG, ICU, waveform, and related biomedical datasets.
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
datasetnanpmc.ncbi.nlm.nih.gov
pmc.ncbi.nlm.nih.gov
Millions of full-text biomedical articles under reuse-friendly licenses.
datasetnandocs.openalex.org
docs.openalex.org
CC0 research graph with snapshot updates; ideal for research retrieval and paper routing.
datasetMozilla
Mozilla
Large multilingual speech dataset project for ASR, speech research, and voice tooling.
datasetnanHugging Face
Hugging Face
Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.