Mask2Former Model Zoo and Baselines

Introduction

This file documents a collection of models reported in our paper. All numbers were obtained on Big Basin servers with 8 NVIDIA V100 GPUs & NVLink (except Swin-L models are trained with 16 NVIDIA V100 GPUs).

How to Read the Tables

The "Name" column contains a link to the config file. Running train_net.py --num-gpus 8 with this config file will reproduce the model (except Swin-L models are trained with 16 NVIDIA V100 GPUs with distributed training on two nodes).
The model id column is provided for ease of reference. To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.

Detectron2 ImageNet Pretrained Models

It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:

R-50.pkl (torchvision): converted copy of torchvision's ResNet-50 model. More details can be found in the conversion script.
R-103.pkl: a ResNet-101 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers (a.k.a. ResNet101c in our paper). We pre-train this backbone on ImageNet using the default recipe of pytorch examples.

Note: below are available pretrained models in Detectron2 that we do not use in our paper.

R-50.pkl: converted copy of MSRA's original ResNet-50 model.
R-101.pkl: converted copy of MSRA's original ResNet-101 model.
X-101-32x8d.pkl: ResNeXt-101-32x8d model trained with Caffe2 at FB.

Third-party ImageNet Pretrained Models

Our paper also uses ImageNet pretrained models that are not part of Detectron2, please refer to tools to get those pretrained models.

License

All models available for download through this document are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

ADE20K Model Zoo

Panoptic Segmentation

Name	Backbone	iterations	PQ	AP	mIoU	model id	download
Mask2Former (200 queries)	Swin-L (IN21k)	160k	48.1	34.2	54.5	48267279	model
Mask2Former (200 queries) + RankSeg	Swin-L (IN21k)	160k	48.9	-	56.2	-	model

Semantic Segmentation

Name	Backbone	iterations	mIoU	mIoU (ms+flip)	model id	download
Mask2Former	Swin-B (IN21k)	160k	53.9	55.1	48333157_5	model
Mask2Former + RankSeg	Swin-B (IN21k)	160k	54.9	55.6	-	model
Mask2Former + GT	Swin-B (IN21k)	160k	68.0	-	-	model
Mask2Former	Swin-L (IN21k)	160k	56.1	57.3	48004474_0	model
Mask2Former + RankSeg	Swin-L (IN21k)	160k	56.5	58.0	-	model

Video Instance Segmentation

YouTubeVIS 2019

Name	Backbone	iterations	AP	model id	download
Mask2Former	R101	6k	49.2	50897581_1	model
Mask2Former + RankSeg	R101	6k	50.5	-	model
Mask2Former	Swin-B (IN21k)	6k	59.5	50897733_2	model
Mask2Former + RankSeg	Swin-B (IN21k)	6k	60.3	-	model
Mask2Former (200 queries)	Swin-L (IN21k)	6k	60.4(60.7)	50908813_0	model
Mask2Former (200 queries) + RankSeg	Swin-L (IN21k)	6k	61.1(61.4)	-	model

* Upload result.json to the online server to evaluate YoutubeVis2019 model. Considering the variance in result, We report the avarage result of 3 models for our methods.

Video Semantic Segmentation

VSPW

Name	Backbone	iterations	mIoU	model id	download
Mask2Former	R101	6k	45.9	-	model
Mask2Former + RankSeg	R101	6k	47.0	-	model
Mask2Former + GT	R101	6k	62.3	-	model
Mask2Former	Swin-B (IN21k)	6k	59.4	-	-
Mask2Former + RankSeg	Swin-B (IN21k)	6k	60.1	-	-

* Considering the variance in result, We report the avarage result of 3 models for baseline and our methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

Mask2Former Model Zoo and Baselines

Introduction

How to Read the Tables

Detectron2 ImageNet Pretrained Models

Third-party ImageNet Pretrained Models

License

ADE20K Model Zoo

Panoptic Segmentation

Semantic Segmentation

Video Instance Segmentation

YouTubeVIS 2019

Video Semantic Segmentation

VSPW

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

Mask2Former Model Zoo and Baselines

Introduction

How to Read the Tables

Detectron2 ImageNet Pretrained Models

Third-party ImageNet Pretrained Models

License

ADE20K Model Zoo

Panoptic Segmentation

Semantic Segmentation

Video Instance Segmentation

YouTubeVIS 2019

Video Semantic Segmentation

VSPW