AI Power Progress iA
All Resources / Topics / Topic / FineWeb2
Resource detail

FineWeb2

Best multilingual extension of FineWeb pipeline; very broad language coverage.

base-pretraining continual-pretraining cpt dataset hugging-face llm-engineering local-ai multilingual rag text training-data

Resource Metadata

Category

Base pretraining

Provider

Hugging Face

Type

dataset

Level

unknown

Topic

Local AI / LLM Engineering / RAG

Track

Local AI / LLM Engineering / RAG

Section

Open data

Format

Dataset

Status

manual_review

Commercial

manual-review

Featured

yes

Fast start

no

Sequence

nan

Priority

A

Primary source

direct_links_master

Sources

direct_links_master, mega_open_hub, training_data_stack

ID

6e067ab1eda68fa5

Open Resource

Fallback Access