This repository provides scripts to train and evaluate self-supervised Vision Transformer with DINOand is tested and maintained by Intel® Gaudi®. Before you get started, make sure to review the Supported Configurations. For more information on training and inference of deep learning models using Intel Gaudi AI accelerator, refer to developer.habana.ai. To obtain model performance data, refer to the Intel Gaudi Model Performance Data page.
- Model-References
- Model Overview
- Setup
- Training Examples
- Evaluation
- Supported Configurations
- Changelog
This is a PyTorch implementation for DINO. The model is based on code from facebookresearch/dino repository.
For further details, refer to:
Please follow the instructions provided in the Gaudi Installation Guide
to set up the environment including the $PYTHON
environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform Guide.
The guides will walk you through the process of setting up your system to run the model on Gaudi.
In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version.
You can run the hl-smi
utility to determine the Intel Gaudi software version.
git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/Model-References
- In the docker container, go to the model directory:
cd Model-References/PyTorch/computer_vision/classification/dino
- Install the required packages using pip.
$PYTHON -m pip install -r requirements.txt
Download and extract ImageNet2012 dataset.
NOTE: It is assumed that the above ImageNet dataset is downloaded and available at path /data/pytorch/imagenet/ILSVRC2012/
.
Different evaluation modes require different datasets as described in the following table:
Mode | Dataset | How to get | Example Location |
---|---|---|---|
Video Segmentation | DAVIS 2017 | git clone https://github.com/davisvideochallenge/davis-2017 cd davis-2017 ./data/get_davis.sh |
/data/pytorch/davis-2017/ |
Image Retrieval | Oxford & Paris revisited | git clone https://github.com/filipradenovic/revisitop |
/data/pytorch/revisitop/roxford5k/ /data/pytorch/revisitop/rparis6k/ |
Copy Detection | copydays | wget https://dl.fbaipublicfiles.com/vissl/datasets/copydays_original.tar.gz && tar xvf copydays_original.tar.gz wget https://dl.fbaipublicfiles.com/vissl/datasets/copydays_strong.tar.gz && tar xvf copydays_strong.tar.gz |
/data/pytorch/copydays/ |
NOTE:
- In following commands,
data_path
should point totrain
subdirectory of imagenet. - Running the model with BF16 precision improves the training time and memory requirements, but may affect accuracy results.
- Run self-supervised DINO training with vit_small backbone, FP32 precision and batch size 32 on a single card:
$PYTHON main_dino.py --arch vit_small --data_path /data/pytorch/imagenet/ILSVRC2012/train --output_dir ./dino_vit_small/
- Run self-supervised DINO training with vit_small backbone, BF16 precision and batch size 64 on a single card:
$PYTHON main_dino.py --arch vit_small --data_path /data/pytorch/imagenet/ILSVRC2012/train --output_dir ./dino_vit_small/ --autocast --batch_size_per_device 64
- Run self-supervised DINO training with vit_small backbone, FP32 precision and batch size 32 on 8 cards:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /data/pytorch/imagenet/ILSVRC2012/train --output_dir ./dino_vit_small/
- Run self-supervised DINO training with vit_small backbone, BF16 precision and batch size 64 on 8 cards:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /data/pytorch/imagenet/ILSVRC2012/train --output_dir ./dino_vit_small/ --autocast --batch_size_per_device 64
Once self-supervised training is completed, you can run one of the available evaluation methods.
NOTE:
- It is assumed that the weights from self-supervised training are located in
./dino_vit_small/checkpoint.pth
. - In following commands,
data_path
should point totrain
subdirectory of imagenet.
Single-card KNN Examples
To run KNN-evaluation on a single card, execute the following command:
$PYTHON eval_knn.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/imagenet/ILSVRC2012/
NOTE: In following commands, data_path
should point to train
subdirectory of imagenet.
Multi-card KNN Examples
To run KNN-evaluation on 8 cards, execute the following command:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 eval_knn.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/imagenet/ILSVRC2012
Single-card Linear Examples
To run linear evaluation on a single card, execute the following command:
$PYTHON eval_linear.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/imagenet/ILSVRC2012 --output_dir ./dino_vit_small_eval_linear/
Multi-card Linear Examples
To run linear evaluation on 8 cards, execute the following command:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 eval_linear.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/imagenet/ILSVRC2012 --output_dir ./dino_vit_small_eval_linear/
Single-card Copy Detection Examples
To run copy detection on a single card, execute the following command:
$PYTHON eval_copy_detection.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/copydays
Multi-card Copy Detection Examples
To run copy detection on 8 cards:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 eval_copy_detection.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/copydays
Single-card Image Retrieval Examples
To run image retrieval on a single card, execute the following command:
$PYTHON eval_image_retrieval.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/revisitop --dataset roxford5k
$PYTHON eval_image_retrieval.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/revisitop --dataset rparis6k
Multi-card Image Retrieval Examples
To run image retrieval on 8 cards, execute the following command:
$PYTHON -m torch.distributed.launch --nproc_per_node=8 eval_image_retrieval.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/revisitop --dataset roxford5k
$PYTHON -m torch.distributed.launch --nproc_per_node=8 eval_image_retrieval.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/revisitop --dataset rparis6k
To run video segmentation, execute the following command:
$PYTHON eval_video_segmentation.py --pretrained_weights ./dino_vit_small/checkpoint.pth --data_path /data/pytorch/davis-2017
Visualizing Attention
To visualize attention, execute the following command:
$PYTHON visualize_attention.py --pretrained_weights ./dino_vit_small/checkpoint.pth --image_path PATH_TO_SOURCE_IMAGE
Video Generation
To generate video with visualized attention, execute the following command:
$PYTHON video_generation.py --pretrained_weights ./dino_vit_small/checkpoint.pth --input_path PATH_TO_SOURCE_VIDEO
Each training/evaluation command can be run with --help
flag to list all available parameters and their descriptions. For example:
$PYTHON main_dino.py --help
$PYTHON eval_knn.py --help
$PYTHON eval_linear.py --help
$PYTHON eval_image_retrieval.py --help
$PYTHON eval_video_segmentation.py --help
$PYTHON eval_copy_detection.py --help
$PYTHON visualize_attention.py --help
$PYTHON video_generation.py --help
Validated on | Intel Gaudi Software Version | PyTorch Version | Mode |
---|---|---|---|
Gaudi | 1.16.2 | 2.2.2 | Training |
- Initial release.
- Enabled additional tasks (eval_copy_detection, eval_video_segmentation, eval_image_retrieval, visualize_attention, video_generation).
- Removed workaround for index_copy_.
- Removed workaround for bicubic interpolation mode.
- Fixed OOM for batch_size=64 on FP32.
- Added support for autocast on Gaudi.
- Dynamic Shapes will be enabled by default in future releases. It is currently enabled in the training script as a temporary solution.
- Removed support for HMP.
Major changes done to original model from facebookresearch/dino repository:
- Modified some scripts to run the model on Gaudi:
- Loaded Intel Gaudi PyTorch module.
- Changed tensors device assignment from
cuda
tohpu
.
- Applied temporary workarounds scripts to enable the model on HPU:
- Changed the default batch_size_per_device to
32
for self-supervised part. - Avoided execution of torch.cat operator with empty tensors.
- Moved
dino_loss
tocpu
device at the time of checkpoint saving due to a bug in PyTorch framework: pytorch/pytorch#77533. - Increased the number of chunks in
knn_classifier
from100
to200
. - Moved
argsort
tocpu
.
- Changed the default batch_size_per_device to
- Improved performance of the model by limiting synchronization between CPU and the device within gradient clipping implementation.
- Additional functionalities like TensorBoard, throughput logging and limiting dataset size have been added.