Similar items by topic, tags, and provider (metadata-only).
datasethuggingface.co
huggingface.co
Access: open (check license). Trivia QA dataset for retrieval + reading comprehension
datasethuggingface.co
huggingface.co
Review dataset terms/acknowledgments and filter by license for your use case.
datasethuggingface.co
huggingface.co
Access: open. Large code corpus
datasethuggingface.co
huggingface.co
Access: open (check license). Reading comprehension QA dataset
datasethuggingface.co
huggingface.co
Open LLM dataset built from Common Crawl snapshots with quality signals and dedup metadata.
datasethuggingface.co
huggingface.co
Benchmarks & dataset references
datasethuggingface.co
huggingface.co
Access: open. Math reasoning
datasethuggingface.co
huggingface.co
Access: open (check license). Open-domain QA dataset (long + short answers)
datasethuggingface.co
huggingface.co
Access: open. Advanced math
datasethuggingface.co
huggingface.co
multimodal_pretraining
datasethuggingface.co
huggingface.co
Access: research-only. Video-text dataset
datasethuggingface.co
huggingface.co
Access: open (check license). Multi-hop QA for RAG evaluation