Create dataset loader for Indo_MultiModal_PMD_ID #306

SamuelCahyawijaya · 2022-10-02T16:07:08Z

Dataset	id_mm_pmd
Description	Introduced in the FLAVA paper, Public Multimodal Dataset (PMD) is a collection of publicly-available image-text pair datasets. PMD contains 70M image-text pairs in total with 68M unique images. The dataset contains pairs from Conceptual Captions, Conceptual Captions 12M, WIT, Localized Narratives, RedCaps, COCO, SBU Captions, Visual Genome and a subset of YFCC100M dataset. Indo_MultiModal_PMD_Indonesia is the Indonesian language version.
License	License refers to the individual datasets that compose PMD_Indonesia

acul3 · 2022-10-04T07:06:47Z

#self-assign

SamuelCahyawijaya added this to Nusantara Dataset Initiative Oct 2, 2022

muhsatrio added the hacktoberfest label Oct 3, 2022

github-actions bot assigned acul3 Oct 4, 2022

Provide feedback