AI Power Progress iA
Topic

Computer Vision / Multimodal / Audio

Curated resources for Computer Vision / Multimodal / Audio.

17 resource(s) available

Related Topics

Explore adjacent topic pages (server-rendered; no tracking).

Filter + Paging

No results.

Results

docsFoundationOpenCV

OpenCV Tutorials

OpenCV

Still the fastest practical path to vision preprocessing, classical CV, and real-world image pipelines.

repoBuildOpenAI

OpenAI Whisper

OpenAI

Great practical base for local transcription pipelines and speech experiments.

docsBuildOpenCV

OpenCV Docs

OpenCV

Reference docs for computer vision building blocks, APIs, and modules.

datasetMozilla

Common Voice

Mozilla

Large multilingual speech dataset project for ASR, speech research, and voice tooling.

datasetnanimage-net.org

ImageNet

image-net.org

Still useful for vision classification baselines.

datasetLAION

LAION-5B

LAION

Web-scale image-text corpus for multimodal research; use with strong filtering and license review.

datasetnanstorage.googleapis.com

Open Images V7

storage.googleapis.com

Large supervised vision dataset with labels, boxes, masks, relations, narratives.

videonanOpenCV

OpenCV

OpenCV

Video supplement for image processing and computer vision workflows.

reponanGitHub

WIT

GitHub

Excellent multilingual image-text corpus from Wikipedia/Wikimedia.