AI Power Progress iA
All Resources / Topics / Topic / CodeSearchNet
Resource detail

CodeSearchNet

Dataset and benchmark for code search and code-language retrieval tasks.

code-search codesearchnet csharp dataset local-ai open-data repo training-data

Resource Metadata

Category

Code

Provider

GitHub

Type

repo

Level

unknown

Topic

Local AI / LLM Engineering / RAG

Track

Local AI / LLM Engineering / RAG

Section

Open data

Format

Dataset / benchmark

Status

publishable

Commercial

candidate

Featured

no

Fast start

no

Sequence

nan

Priority

A

Primary source

mega_open_hub

Sources

mega_open_hub, training_data_stack, website_existing

ID

9fcd33356c3f628c

Open Resource

Fallback Access

Continue Learning

Keep momentum with nearby resources and structured tracks.

Learning placement: track: Local AI / LLM Engineering / RAG

Tags: code-search codesearchnet csharp dataset local-ai open-data repo training-data

Related Resources

Similar items by topic, tags, and provider (metadata-only).

reponanGitHub

FLAN Collection

GitHub

One of the best open instruction mixtures; includes FLAN, P3, Super-Natural Instructions, and more.

datasetnanHugging Face

FineWeb

Hugging Face

Huge cleaned English web corpus; best raw breadth for LLM pretraining.

datasetfoundationopenstax.org

OpenStax

openstax.org

Open textbooks across core STEM and humanities subjects.

datasetOpenAI

OpenAI GSM8K

OpenAI

Widely used grade-school math reasoning benchmark for evaluation and small-scale instruction data work.

repoFoundationggml-org

llama.cpp

ggml-org

Core local inference stack for CPU / GPU quantized deployment and experimentation.

datasetOpenML

OpenML Docs

OpenML

Open ecosystem for datasets, tasks, models, and runs that helps standardize ML benchmarking.

datasetnanHugging Face

FineWeb2

Hugging Face

Best multilingual extension of FineWeb pipeline; very broad language coverage.