Skip to content
@opendatalab

OpenDataLab

OpenDataLab provides access to numerous significant open-source datasets.

English🌎|简体中文🀄

🔥🔥🔥OpenDataLab Provide ecology for high-quality datasets for community. It provides:

🌟Extensive open data resources for AI Model

● High-speed and simple way to access open datasets
● 7700+ Large scale and high-quality open datasets for large model
● 1200+ Open datasets for Computer Vision
● 200+ Open datasets by CVPR
● Categorized datasets for hot topics

✨Open-source data processing toolkits

● Data acquisition toolkits supporting large datasets
● Data acquisition toolkits supporting kinds of tasks
● Open source intelligent Toolbox for Labeling

💫Dataset description language

● Format standardization
● DSDL: Dataset Description Language
● Define a CV dataset by DSDL
● OpenDataLab Standardized 100+ CV Datasets

Check our tutorials videos (in Chinese) to get started.


📣 We have upgraded and launched the function of authors uploading datasets independently. We hereby invite you to participate in using it to better promote your open source datasets, AI research results, etc., so that more people can access, obtain and use your dataset.

This is an introduction to the dataset autonomous upload function 【help doc】,You can create and share your dataset according to our guidelines. 💪

If you have any questions or obstacles, please feel free to contact us [email protected].

Popular repositories Loading

  1. MinerU MinerU Public

    A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

    Python 16.3k 1.2k

  2. PDF-Extract-Kit PDF-Extract-Kit Public

    A Comprehensive Toolkit for High-Quality PDF Content Extraction

    Python 5.7k 381

  3. labelU labelU Public

    Data annotation toolbox supports image, audio and video data.

    Python 857 78

  4. LabelLLM LabelLLM Public

    The Open-Source Data Annotation Platform

    TypeScript 568 44

  5. WanJuan1.0 WanJuan1.0 Public

    万卷1.0多模态语料

    545 28

  6. DocLayout-YOLO DocLayout-YOLO Public

    DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

    Python 469 32

Repositories

Showing 10 of 34 repositories
  • MinerU Public

    A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

    opendatalab/MinerU’s past year of commit activity
    Python 16,274 AGPL-3.0 1,174 131 8 Updated Nov 15, 2024
  • VHM Public

    VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

    opendatalab/VHM’s past year of commit activity
    Python 46 Apache-2.0 2 3 0 Updated Nov 15, 2024
  • labelU Public

    Data annotation toolbox supports image, audio and video data.

    opendatalab/labelU’s past year of commit activity
    Python 857 Apache-2.0 78 14 0 Updated Nov 15, 2024
  • opendatalab/magic-html’s past year of commit activity
    Python 255 Apache-2.0 22 5 0 Updated Nov 14, 2024
  • labelU-Kit Public

    Data annotation component library --provided as NPM packages

    opendatalab/labelU-Kit’s past year of commit activity
    TypeScript 63 Apache-2.0 16 4 1 Updated Nov 14, 2024
  • LabelLLM Public

    The Open-Source Data Annotation Platform

    opendatalab/LabelLLM’s past year of commit activity
    TypeScript 568 Apache-2.0 44 6 0 Updated Nov 6, 2024
  • DocLayout-YOLO Public

    DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

    opendatalab/DocLayout-YOLO’s past year of commit activity
    Python 469 AGPL-3.0 32 2 1 Updated Oct 31, 2024
  • LOKI Public

    The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”

    opendatalab/LOKI’s past year of commit activity
    Python 108 1 1 0 Updated Oct 28, 2024
  • skydiffusion Public

    The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”

    opendatalab/skydiffusion’s past year of commit activity
    27 Apache-2.0 1 1 0 Updated Oct 24, 2024
  • PDF-Extract-Kit Public

    A Comprehensive Toolkit for High-Quality PDF Content Extraction

    opendatalab/PDF-Extract-Kit’s past year of commit activity
    Python 5,650 AGPL-3.0 381 52 4 Updated Oct 24, 2024