Skip to content

Latest commit

 

History

History

image_classification

Image classification with UniFormer

We currenent release the code and models for:

  • ImageNet-1K pretraining

  • ImageNet-1K pretraining + Token Labeling

  • Large resolution fine-tuning

  • Lightweight Model

Update

05/21/2022

Lightweight models are released, which surpass MobileViT, PVTv2 and EfficientNet.

03/06/2022

Some models with head_dim=64 are released, which can save memory cost for downstream tasks.

01/19/2022

  1. Pretrained models on ImageNet-1K with Token Labeling.
  2. Large resolution fine-tuning.

01/13/2022

Pretrained models on ImageNet-1K are released.

Model Zoo

Lightweight models on ImageNet-1K

The followed models and logs can be downloaded on Google Drive: total_models, total_logs.

We also release the models on Baidu Cloud: total_models (bdkq), total_logs (ttub).

Model Top-1 Resolution #Param. FLOPs Model Log Shell
UniFormer-XXS 76.8 128x128 10.2M 0.43G google google run.sh
UniFormer-XXS 79.1 160x160 10.2M 0.67G google google run.sh
UniFormer-XXS 79.9 192x192 10.2M 0.96G google google run.sh
UniFormer-XXS 80.6 224x224 10.2M 1.3G google google run.sh
UniFormer-XS 81.5 192x192 16.5M 1.4G google google run.sh
UniFormer-XS 82.0 224x224 16.5M 2.0G google google run.sh

For those lightweight models, we train them with longer (600) epochs and weaker data augmentation. Besides, to avoid loss NAN, we do not use mixed precision training.

ImageNet-1K pretrained (224x224)

The followed models and logs can be downloaded on Google Drive: total_models, total_logs.

We also release the models on Baidu Cloud: total_models (bdkq), total_logs (ttub).

Model Top-1 #Param. FLOPs Model Log Shell
UniFormer-S 82.9 22M 3.6G google google run.sh
UniFormer-S† 83.4 24M 4.2G google google run.sh
UniFormer-B 83.8 50M 8.3G google - run.sh
UniFormer-B+Layer Scale 83.9 50M 8.3G google google run.sh

Though Layer Scale is helpful for training deep models, we meet some problems when fine-tuning on video datasets. Hence, we only use the models trained without it for video tasks.

Due to the model UniFormer-S† uses head_dim=32, which cause much memory cost for downstream tasks. We re-train these models with head_dim=64. All models are trained with 224x224 resolution.

Model Top-1 #Param. FLOPs Model Log Shell
UniFormer-S† 83.4 24M 4.2G google google run.sh

ImageNet-1K pretrained with Token Labeling (224x224)

The followed models and logs can be downloaded on Google Drive: total_models, total_logs.

We also release the models on Baidu Cloud: total_models (p05h), total_logs (wsvi).

We follow LV-ViT to train our models with Token Labeling. Please see token_labeling for more details.

Model Top-1 #Param. FLOPs Model Log Shell
UniFormer-S 83.4 (+0.5) 22M 3.6G google google run.sh
UniFormer-S† 83.9 (+0.5) 24M 4.2G google google run.sh
UniFormer-B 85.1 (+1.3) 50M 8.3G google google run.sh
UniFormer-L+Layer Scale 85.6 100M 12.6G google google run.sh

Due to the models UniFormer-S/S†/B use head_dim=32, which cause much memory cost for downstream tasks. We re-train these models with head_dim=64. All models are trained with 224x224 resolution.

Model Top-1 #Param. FLOPs Model Log Shell
UniFormer-S 83.4 (+0.5) 22M 3.6G google google run.sh
UniFormer-S† 83.6 (+0.2) 24M 4.2G google google run.sh
UniFormer-B 84.8 (+1.0) 50M 8.3G google google run.sh

Large resolution fine-tuning (384x384)

The followed models and logs can be downloaded on Google Drive: total_models, total_logs.

We also release the models on Baidu Cloud: total_models (p05h), total_logs (wsvi).

We fine-tune the above models with Token Labeling on resolution of 384x384. Please see token_labeling for more details.

Model Top-1 #Param. FLOPs Model Log Shell
UniFormer-S 84.6 22M 11.9G google google run.sh
UniFormer-S† 84.9 24M 13.7G google google run.sh
UniFormer-B 86.0 50M 27.2G google google run.sh
UniFormer-L+Layer Scale 86.3 100M 39.2G google google run.sh

Usage

Our repository is built base on the DeiT repository, but we add some useful features:

  1. Calculating accurate FLOPs and parameters with fvcore (see check_model.py).
  2. Auto-resuming.
  3. Saving best models and backup models.
  4. Generating training curve (see generate_tensorboard.py).

Installation

  • Clone this repo:

    git clone https://github.com/Sense-X/UniFormer.git
    cd UniFormer
  • Install PyTorch 1.7.0+ and torchvision 0.8.1+

    conda install -c pytorch pytorch torchvision
  • Install other packages

    pip install timm
    pip install fvcore

Training

Simply run the training scripts in exp as followed:

bash ./exp/uniformer_small/run.sh

If the training was interrupted abnormally, you can simply rerun the script for auto-resuming. Sometimes the checkpoint may not be saved properly, you should set the resumed model via --reusme ${work_path}/ckpt/backup.pth.

Evaluation

Simply run the evaluating scripts in exp as followed:

bash ./exp/uniformer_small/test.sh

It will evaluate the last model by default. You can set other models via --resume.

Generate curves

You can generate the training curves as followed:

python3 generate_tensoboard.py

Note that you should install tensorboardX.

Calculating FLOPs and Parameters

You can calculate the FLOPs and parameters via:

python3 check_model.py

Acknowledgement

This repository is built using the timm library and the DeiT repository.