All Resources / Topics / Topic / FLAN Collection

Resource detail

FLAN Collection

One of the best open instruction mixtures; includes FLAN, P3, Super-Natural Instructions, and more.

Open Learn All Resources Search

dataset github instruction-tuning llm-engineering local-ai rag repo sft text-instructions training-data

Resource Metadata

Provider

GitHub

Type

repo

Level

unknown

Topic

Local AI / LLM Engineering / RAG

Track

Local AI / LLM Engineering / RAG

Section

Open data

Format

Dataset

Status

manual_review

Commercial

manual-review

Featured

yes

Fast start

Sequence

nan

Priority

Primary source

direct_links_master

Sources

direct_links_master, mega_open_hub, training_data_stack

ID

4054ee2dc60d7bd9

Open Resource

Fallback Access

https://web.archive.org/web/*/https://github.com/google-research/FLAN

Continue Learning

Keep momentum with nearby resources and structured tracks.

Learning placement: track: Local AI / LLM Engineering / RAG · stage: nan

Tags: dataset github instruction-tuning llm-engineering local-ai rag repo sft text-instructions training-data

More in this topic More by provider More of this type Learning Hub Start Here

Related Resources

Similar items by topic, tags, and provider (metadata-only).

datasetnanHugging Face

Vision-Flan

Hugging Face

Good open visual instruction tuning layer after base vision-language pretraining.

Open Source

reponanGitHub

Natural Instructions

GitHub

Large collection of NLP tasks with human-readable instructions.

Open Source

repoGitHub

CodeSearchNet

GitHub

Dataset and benchmark for code search and code-language retrieval tasks.

Open Source

repoFoundationggml-org

llama.cpp

ggml-org

Core local inference stack for CPU / GPU quantized deployment and experimentation.

Open Source

datasetfoundationMIT

MIT OpenCourseWare

MIT

Excellent lecture notes, exams, and videos across advanced technical topics.

Open Source

datasetnanHugging Face

FineWeb2

Hugging Face

Best multilingual extension of FineWeb pipeline; very broad language coverage.

Open Source

datasetnanHugging Face

FineWeb

Hugging Face

Huge cleaned English web corpus; best raw breadth for LLM pretraining.

Open Source

datasetnanHugging Face

Cosmopedia

Hugging Face

Synthetic textbook/blog/WikiHow-style corpus that helps tutor-like explanations.

Open Source

datasetnanHugging Face

Common Pile v0.1

Hugging Face

Best legally cleaner starting corpus: 8 TB of public-domain and openly licensed text spanning books, papers, code, encyclopedias, educational materials, and transcripts.

Open Source

datasetUCI

UCI Machine Learning Repository

UCI

Classic and modern ML datasets that are ideal for education, benchmarking, and tabular experiments.

Open Source

videonanHugging Face

Hugging Face

Modern LLM engineering, datasets, transformers, and community tutorials.

Open Source

datasetFoundationHugging Face

Hugging Face Datasets documentation

Hugging Face

Essential for loading, cleaning, streaming, and publishing training / eval data.

Open Source

FLAN Collection

Resource Metadata

Category

Provider

Type

Level

Topic

Track

Section

Format

Status

Commercial

Featured

Fast start

Sequence

Priority

Primary source

Sources

ID

Fallback Access

Continue Learning

Related Resources