Similar items by topic, tags, and provider (metadata-only).
docsAdvancedHugging Face
Hugging Face
Useful for supervised fine-tuning, preference tuning, and training experiments.
docsBuildQdrant
Qdrant
Simple, strong choice for local and hybrid vector search systems.
docsBuildWeaviate
Weaviate
Useful for hybrid search, generative integrations, and production vector workflows.
datasetFoundationHugging Face
Hugging Face
Essential for loading, cleaning, streaming, and publishing training / eval data.
docsZeroOllama
Ollama
Fastest path to running modern local models on a workstation.
docsZeroOpen WebUI
Open WebUI
Gives you an offline-friendly interface for local models, documents, and workflows.
docsBuilddeepset
deepset
Solid framework for retrieval pipelines, agents, evaluation, and production patterns.
videonanHugging Face
Hugging Face
Modern LLM engineering, datasets, transformers, and community tutorials.
datasetnanHugging Face
Hugging Face
Best multilingual extension of FineWeb pipeline; very broad language coverage.
datasetnanHugging Face
Hugging Face
Huge cleaned English web corpus; best raw breadth for LLM pretraining.
datasetnanHugging Face
Hugging Face
Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.
datasetnanHugging Face
Hugging Face
Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.