Gaudi-DASH

This repository contains the implementation of Direction-Aware SHrinking (DASH), a method for warm-starting neural network training without losing plasticity under stationary conditions in Intel Gaudi.

We also included code in verify.ipynb and verify_nvidia.ipynb that reports some issues encountered when applying the Sharpness-Aware Minimization (SAM) algorithm in eager/lazy mode with Intel Gaudi.

📄 Paper

For more details, check out our paper:

DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity

🛠️ Setup

To set up the environment, run:

conda env create -f env.yaml

🚀 Usage

Standard Training

To train the model, use:

python main.py --dataset [dataset] --model [model] --train_type [train_type] --optimizer_type [optimizer_type]

Available options:

Datasets: cifar10, cifar100, svhn, imagenet
Models: resnet18, vgg16, mlp
Training types: cold, warm, warm_rm, reset, l2_init, sp, dash
Optimizer types: sgd, sam

State-of-the-Art (SoTA) Training

For SoTA settings, use:

python main.py --dataset [dataset] --model resnet18 --train_type [train_type] --optimizer_type [optimizer_type] \
    --sota True --weight_decay 5e-4 --learning_rate 0.1 --batch_size 128 --max_epoch 260

Available options for SoTA settings:

Datasets: cifar10, cifar100, imagenet
Model: resnet18
Training types and optimizer types: Same as standard training

Tiny-ImageNet Training

To use dataset = imagenet:

Download the dataset from http://cs231n.stanford.edu/tiny-imagenet-200.zip or use wget:

wget http://cs231n.stanford.edu/tiny-imagenet-200.zip

Create a folder named data:

mkdir data

Unzip the downloaded Tiny-ImageNet dataset to the data folder

unzip tiny-imagenet-200.zip -d data/

Launch tiny-imagenet_preprocess.py to preprocess the test data:

python tiny-imagenet_preprocess.py

📈 Synthetic Experiment

For our synthetic experiment described in Section 4, please refer to the Discrete_Feature_Learning.ipynb file.

📚 Citation

@inproceedings{
    shin2024dash,
    title={{DASH}: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity},
    author={Baekrok Shin and Junsoo Oh and Hanseul Cho and Chulhee Yun},
    booktitle={2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)},
    year={2024},
    url={https://openreview.net/forum?id=GR5LXaglgG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
utils		utils
.gitignore		.gitignore
DataLoader.py		DataLoader.py
Discrete_Feature_Learning.ipynb		Discrete_Feature_Learning.ipynb
README.md		README.md
command_generate.py		command_generate.py
main.py		main.py
tiny-imagenet_preprocess.py		tiny-imagenet_preprocess.py
verify.ipynb		verify.ipynb
verify_eager.pkl		verify_eager.pkl
verify_eager_nvidia.pkl		verify_eager_nvidia.pkl
verify_lazy.pkl		verify_lazy.pkl
verify_lazy_nvidia.pkl		verify_lazy_nvidia.pkl
verify_nvidia.ipynb		verify_nvidia.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gaudi-DASH

📄 Paper

🛠️ Setup

🚀 Usage

Standard Training

State-of-the-Art (SoTA) Training

Tiny-ImageNet Training

📈 Synthetic Experiment

📚 Citation

About

Releases

Packages

Languages

NAVER-INTEL-Co-Lab/gaudi-dash

Folders and files

Latest commit

History

Repository files navigation

Gaudi-DASH

📄 Paper

🛠️ Setup

🚀 Usage

Standard Training

State-of-the-Art (SoTA) Training

Tiny-ImageNet Training

📈 Synthetic Experiment

📚 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages