Similar items by topic, tags, and provider (metadata-only).
datasetHugging Face
Hugging Face
Massive public dataset hub spanning NLP, code, vision, audio, robotics, and benchmarks.
videonanHugging Face
Hugging Face
Modern LLM engineering, datasets, transformers, and community tutorials.
docsAdvancedHugging Face
Hugging Face
Core for LoRA and efficient adaptation on local or limited hardware.
docsAdvancedHugging Face
Hugging Face
Useful for supervised fine-tuning, preference tuning, and training experiments.
docsBuildQdrant
Qdrant
Simple, strong choice for local and hybrid vector search systems.
datasetnanHugging Face
Hugging Face
Best multilingual extension of FineWeb pipeline; very broad language coverage.
datasetnanHugging Face
Hugging Face
Huge cleaned English web corpus; best raw breadth for LLM pretraining.
datasetnanHugging Face
Hugging Face
Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.
datasetnanHugging Face
Hugging Face
Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.
docsBuildWeaviate
Weaviate
Useful for hybrid search, generative integrations, and production vector workflows.
datasetnanHugging Face
Hugging Face
Crosslingual prompt pool across many languages and tasks.
datasetnanHugging Face
Hugging Face
Good open visual instruction tuning layer after base vision-language pretraining.