Similar items by topic, tags, and provider (metadata-only).
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
videonanMIT
MIT
Full university lecture series across CS, math, engineering, neuroscience, physics, robotics, and more.
videonanMIT
MIT
Formal lecture series for classical mechanics, E&M, optics, and more.
courseocw.mit.edu
ocw.mit.edu
Mass marketing systems + guerrilla ("gorilla") marketing tactics, with AI-assisted planning, templates, analytics, and safety-first execution.
datasetnanaimi.stanford.edu
aimi.stanford.edu
Excellent medical data, but shared data is broadly non-commercial.
docsnanStack Exchange
Stack Exchange
Current official data-dump access requires you to affirm you do not intend to use the file for LLM training.
datasetnanapi.semanticscholar.org
api.semanticscholar.org
Public dataset license is limited to internal, non-commercial research/education use.
datasetnanHugging Face
Hugging Face
Mixed-source replication corpus including CommonCrawl, C4, GitHub, arXiv, Wikipedia, and StackExchange.
docsnanCommon Crawl
Common Crawl
Massive but extremely noisy; requires expensive filtering, dedupe, PII screening, and quality scoring.
datasetnanHugging Face
Hugging Face
Curators note they do not own the underlying text and maintain a takedown process.
datasetUCI
UCI
Classic and modern ML datasets that are ideal for education, benchmarking, and tabular experiments.
datasetnanHugging Face
Hugging Face
Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.