Skip to content

Implementation dino v2 for remote sensing with huggingface transformers

Notifications You must be signed in to change notification settings

chagmgang/dinov2-remote-sensing

Repository files navigation

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗


  • Pytorch implementation and pretrained models for DINO v2 in remote sensing.
  • See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py
deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name # of corpus Dataset Paper
Million-AID 990,666 Link
SkyScript 5,181,068 Link
Total 6,171,734

Pretrained Model on Huggingface

Model Epoch Total Params Student Backbone Params Student DINO Head Params Student iBOT Head Params Weight & Config Logs
ViT-S/16-e25 25 132M 21M 22M 22M Link logs
ViT-S/16-e100 25 132M 21M 22M 22M
ViT-B/16-e25 25
ConvViT-S-e25 25

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name Dataset Paper
RESISC Remote Sensing Image Scene Classification: Benchmark and State of the Art
Optimal 31 Scene Classification With Recurrent Attention of VHR Remote Sensing Images
MLRSNet MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
WHU-RS19
EuroSAT EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
UC Merced Bag-of-visual-words and spatial extensions for land-use classification
Cv-BrCT AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
AiRound AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
RSI-CB128 RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 94.381 96.237 96.642 99.811 98.037 99.048 77.613 78.644 99.593

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 93.365 89.785 96.981 97.196 95.741 87.143 76.208 77.881 98.943

Property Analysis

About

Implementation dino v2 for remote sensing with huggingface transformers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published