Skip to content

Latest commit

 

History

History
261 lines (223 loc) · 12.7 KB

MODEL_ZOO.md

File metadata and controls

261 lines (223 loc) · 12.7 KB

Mask2Former Model Zoo and Baselines

Introduction

This file documents a collection of models reported in our paper. All numbers were obtained on Big Basin servers with 8 NVIDIA V100 GPUs & NVLink (except Swin-L models are trained with 16 NVIDIA V100 GPUs).

How to Read the Tables

  • The "Name" column contains a link to the config file. Running train_net.py --num-gpus 8 with this config file will reproduce the model (except Swin-L models are trained with 16 NVIDIA V100 GPUs with distributed training on two nodes).
  • The model id column is provided for ease of reference. To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.

Detectron2 ImageNet Pretrained Models

It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:

Note: below are available pretrained models in Detectron2 that we do not use in our paper.

Third-party ImageNet Pretrained Models

Our paper also uses ImageNet pretrained models that are not part of Detectron2, please refer to tools to get those pretrained models.

License

All models available for download through this document are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

ADE20K Model Zoo

Panoptic Segmentation

Name Backbone iterations PQ AP mIoU model id download
Mask2Former (200 queries) Swin-L (IN21k) 160k 48.1 34.2 54.5 48267279 model
Mask2Former (200 queries) + RankSeg Swin-L (IN21k) 160k 48.9 - 56.2 - model

Semantic Segmentation

Name Backbone iterations mIoU mIoU (ms+flip) model id download
Mask2Former Swin-B (IN21k) 160k 53.9 55.1 48333157_5 model
Mask2Former + RankSeg Swin-B (IN21k) 160k 54.9 55.6 - model
Mask2Former + GT Swin-B (IN21k) 160k 68.0 - - model
Mask2Former Swin-L (IN21k) 160k 56.1 57.3 48004474_0 model
Mask2Former + RankSeg Swin-L (IN21k) 160k 56.5 58.0 - model

Video Instance Segmentation

YouTubeVIS 2019

Name Backbone iterations AP model id download
Mask2Former R101 6k 49.2 50897581_1 model
Mask2Former + RankSeg R101 6k 50.5 - model
Mask2Former Swin-B (IN21k) 6k 59.5 50897733_2 model
Mask2Former + RankSeg Swin-B (IN21k) 6k 60.3 - model
Mask2Former (200 queries) Swin-L (IN21k) 6k 60.4(60.7) 50908813_0 model
Mask2Former (200 queries) + RankSeg Swin-L (IN21k) 6k 61.1(61.4) - model

* Upload result.json to the online server to evaluate YoutubeVis2019 model. Considering the variance in result, We report the avarage result of 3 models for our methods.

Video Semantic Segmentation

VSPW

Name Backbone iterations mIoU model id download
Mask2Former R101 6k 45.9 - model
Mask2Former + RankSeg R101 6k 47.0 - model
Mask2Former + GT R101 6k 62.3 - model
Mask2Former Swin-B (IN21k) 6k 59.4 - -
Mask2Former + RankSeg Swin-B (IN21k) 6k 60.1 - -

* Considering the variance in result, We report the avarage result of 3 models for baseline and our methods.