Similar items by topic, tags, and provider (metadata-only).
datasetUCI
UCI
Classic and modern ML datasets that are ideal for education, benchmarking, and tabular experiments.
datasetHugging Face
Hugging Face
Massive public dataset hub spanning NLP, code, vision, audio, robotics, and benchmarks.
datasetUCI
UCI
Direct browser for UCI datasets when you want a clean, filterable dataset list for website linking.
docsFoundationOllama
Ollama
Official documentation for running and integrating local models with a simple developer workflow.
docsBuildOpen WebUI
Open WebUI
Offline-first self-hosted AI interface that works well as a local front-end for models and knowledge tools.
datasetfoundationMIT
MIT
Excellent lecture notes, exams, and videos across advanced technical topics.
datasetnanHugging Face
Hugging Face
Best multilingual extension of FineWeb pipeline; very broad language coverage.
datasetnanHugging Face
Hugging Face
Huge cleaned English web corpus; best raw breadth for LLM pretraining.
datasetnanHugging Face
Hugging Face
Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.
datasetnanHugging Face
Hugging Face
Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.
datasetFoundationHugging Face
Hugging Face
Essential for loading, cleaning, streaming, and publishing training / eval data.
datasetnanHugging Face
Hugging Face
Crosslingual prompt pool across many languages and tasks.