GitHub - wuji3/visiondk: A powerful baseline for image classification, face recognition and image retrieval with Pytorch

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

Install ☘️

# It is recommanded to create a separate virtual environment
conda create -n vision python=3.10 
conda activate vision

# torch==2.0.1(lower is also ok) -> https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cpuonly -c pytorch # cpu-version
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia  # cuda-version

pip install -r requirements.txt

# Without Arial.ttf, inference may be slow due to network IO.
mkdir -p ~/.config/DuKe
cp misc/Arial.ttf ~/.config/DuKe

Training 🌟️

# one machine one gpu
python main.py --cfgs configs/task/pet.yaml

# one machine multiple gpus
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 main.py --cfgs configs/classification/pet.yaml
                                                                 --sync_bn[Option: this will lead to training slowly]
                                                                 --resume[Option: training from checkpoint]
                                                                 --load_from[Option: training from fine-tuning]

What's New

[Apr. 2024] Face Recognition Task(FRT) is supported now 🚀️️! We provide ResNet, EfficientNet, and Swin Transformer as backbone; As for head, ArcFace, CircleLoss, MegFace and MV Softmax could be used for training. Note: partial implementation refers to JD-FaceX
[Jun. 2023] Image Classification Task(ICT) has launched 🚀️️! Supporting many powerful strategies, such as progressive learning, online enhancement, beautiful training interface, exponential moving average, etc. The models are fully integrated into torchvision.
[May. 2023] The first initialization version of Vision.

Which's task

Implemented Method & Paper

Method	Paper
SAM	Sharpness-Aware Minimization for Efficiently Improving Generalization
Progressive Learning	EfficientNetV2: Smaller Models and Faster Training
OHEM	Training Region-based Object Detectors with Online Hard Example Mining
Focal Loss	Focal Loss for Dense Object Detection
Cosine Annealing	SGDR: Stochastic Gradient Descent with Warm Restarts
Label Smoothing	Rethinking the Inception Architecture for Computer Vision
Mixup	MixUp: Beyond Empirical Risk Minimization
CutOut	Improved Regularization of Convolutional Neural Networks with Cutout
Attention Pool	Augmenting Convolutional networks with attention-based aggregation
GradCAM	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ArcFace	ArcFace: Additive Angular Margin Loss for Deep Face Recognition
CircleLoss	Circle Loss: A Unified Perspective of Pair Similarity Optimization
MegFace	MagFace: A Universal Representation for Face Recognition and Quality Assessment
MV Softmax	Mis-classified Vector Guided Softmax Loss for Face Recognition

Model & Paper

Method	Paper	Name in configs, eg: torchvision-mobilenet_v2
MobileNetv2	MobileNetV2: Inverted Residuals and Linear Bottlenecks	mobilenet_v2
MobileNetv3	Searching for MobileNetV3	mobilenet_v3_small, mobilenet_v3_large
ShuffleNetv2	ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design	shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
ResNet	Deep Residual Learning for Image Recognition	resnet18, resnet34, resnet50, resnet101, resnet152
ResNeXt	Aggregated Residual Transformations for Deep Neural Networks	resnext50_32x4d, resnext101_32x8d, resnext101_64x4d
ConvNext	A ConvNet for the 2020s	convnext_tiny, convnext_small, convnext_base, convnext_large
EfficientNet	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	efficientnet_b{0..7}
EfficientNetv2	EfficientNetV2: Smaller Models and Faster Training	efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l
Swin Transformer	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	swin_t, swin_s, swin_b
Swin Transformerv2	Swin Transformer V2: Scaling Up Capacity and Resolution	swin_v2_t, swin_v2_s, swin_v2_b

Tools

Split the data set into training set and validation set

python tools/data_prepare.py --postfix <jpg or png> --root <input your data realpath> --frac <train segment ratio, eg: 0.9 0.6 0.3 0.9 0.9>

Data augmented visualization

cd visiondk
python -m tools.test_augment

Contact Me

If you enjoy reproducing papers and algorithms, welcome to pull request.
If you have some confusion about the repo, please submit issues.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
built		built
configs		configs
dataset		dataset
distills		distills
engine		engine
misc		misc
models		models
oxford-iiit-pet		oxford-iiit-pet
structure		structure
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
validate.py		validate.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

What's New

Which's task

Implemented Method & Paper

Model & Paper

Tools

Contact Me

About

Releases

Packages

Languages

License

wuji3/visiondk

Folders and files

Latest commit

History

Repository files navigation

VisionDK: ToolBox Of Image Classification & Face Recognition

Tutorials

What's New

Which's task

Implemented Method & Paper

Model & Paper

Tools

Contact Me

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages