AI Power Progress iA
All Resources / Topics / Topic / Common Crawl
Resource detail

Common Crawl

broad_pretraining

ai-training-data dataset open-data

Resource Metadata

Category

base_weights

Provider

commoncrawl.org

Type

dataset

Level

unknown

Topic

AI Training Data

Track

n/a

Section

Open Data Directory

Format

n/a

Status

publishable

Commercial

conditional

Featured

no

Fast start

no

Sequence

n/a

Priority

n/a

Primary source

training_data_stack

Sources

training_data_stack, website_existing

ID

7140efd5e93222fe

Open Resource

Fallback Access