Skip to content

Latest commit

 

History

History
104 lines (84 loc) · 4.78 KB

README.md

File metadata and controls

104 lines (84 loc) · 4.78 KB

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗


  • Pytorch implementation and pretrained models for DINO v2 in remote sensing.
  • See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py
deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name # of corpus Dataset Paper
Million-AID 990,666 Link
SkyScript 5,181,068 Link
Total 6,171,734

Pretrained Model on Huggingface

Model Epoch Total Params Student Backbone Params Student DINO Head Params Student iBOT Head Params Weight & Config Logs
ViT-S/16-e25 25 132M 21M 22M 22M Link logs
ViT-S/16-e100 25 132M 21M 22M 22M
ViT-B/16-e25 25
ConvViT-S-e25 25

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name Dataset Paper
RESISC Remote Sensing Image Scene Classification: Benchmark and State of the Art
Optimal 31 Scene Classification With Recurrent Attention of VHR Remote Sensing Images
MLRSNet MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
WHU-RS19
EuroSAT EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
UC Merced Bag-of-visual-words and spatial extensions for land-use classification
Cv-BrCT AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
AiRound AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
RSI-CB128 RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 94.381 96.237 96.642 99.811 98.037 99.048 77.613 78.644 99.593

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 93.365 89.785 96.981 97.196 95.741 87.143 76.208 77.881 98.943

Property Analysis