-
Paper:Training data-efficient image transformers & distillation through attention
-
Origin Repo:facebookresearch/deit
-
Code:deit.py
-
Evaluate Transforms:
# backend: pil # input_size: 224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # backend: pil # input_size: 384x384 transforms = T.Compose([ T.Resize(384, interpolation='bicubic'), T.CenterCrop(384), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
Model Details:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model DeiT-tiny deit_ti 5.7 1.1 72.18 91.11 Download DeiT-small deit_s 22.0 4.2 79.85 95.04 Download DeiT-base deit_b 86.4 16.8 81.99 95.74 Download DeiT-tiny distilled deit_ti_distilled 5.9 1.1 74.50 91.89 Download DeiT-small distilled deit_s_distilled 22.4 4.3 81.22 95.39 Download DeiT-base distilled deit_b_distilled 87.2 16.9 83.39 96.49 Download DeiT-base 384 deit_b_384 86.4 49.3 83.10 96.37 Download DeiT-base distilled 384 deit_b_distilled_384 87.2 49.4 85.43 97.33 Download
-
Citation:
@article{touvron2020deit, title = {Training data-efficient image transformers & distillation through attention}, author = {Hugo Touvron and Matthieu Cord and Matthijs Douze and Francisco Massa and Alexandre Sablayrolles and Herv'e J'egou}, journal = {arXiv preprint arXiv:2012.12877}, year = {2020} }