STAIR Captions

We developed a large-scale Japanese image caption dataset, named STAIR Captions. STAIR Captions website is http://captions.stair.center .

Annotation Format

STAIR Captions dataset is provided as JSON files. The annotation format of STAIR Captions follows the one of MS-COCO:

annotation{
  "id"                : int,
  "image_id"          : int,
  "caption"           : str,
  "tokenized_caption" : str,
}

For the details of the annotation format, please see MS-COCO download page.

Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi, ``STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset,'' Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper, 2017. [arXiv]
吉川友也, 重藤優太郎, 竹内彰一, ``STAIR Captions: 大規模日本語画像キャプションデータセット'', 言語処理学会第23回年次大会 (NLP2017), 2017. (In Japanese) [PDF]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
stair_captions_v1.2.tar.gz		stair_captions_v1.2.tar.gz