Foundation / pretraining
Large corpora and knowledge bases used to train foundation models. Always verify license and dataset terms for your use case.
Datasets for training, evaluation, and retrieval-augmented generation, with health and fallback visibility.
Large corpora and knowledge bases used to train foundation models. Always verify license and dataset terms for your use case.
High-signal instruction datasets and practical fine-tuning references (LoRA / QLoRA / PEFT).
Benchmarks and datasets used to measure retrieval quality (and to train/evaluate embedding models).
Open corpora to ingest for RAG and references for building + evaluating grounded QA workflows.