Discover Network Architectures

Objective

In iterative fashion, step by step, using one concept at a time, achieve 99.4% Validation Accuracy consistently on MNIST Dataset with in 15 epochs and in under 10,000 parameters, even better if it's below 8,000 parameters

Approach

The target of 99.4% is achieved in 6 iterations.

Target

Parameters: Less than 10,000
Data Augmentations: If required
Regularization: If required
LR Scheduler: If required
Max No. of Epochs: 14
Test Accuracy: 99.4%

1. Bare Skeleton Model - To check whether model is training or not

To create a basic skeleton of the model, and check learning capabilities.

Results

Total Parameters used: 9,902
Best Train Accuracy: 99.14%
Best Test Accuracy: 99.00%
Consistency: Did not achieve any consistency with test accuracy greater than 99.4
Base network architecture design

Analysis

With the base network architecture, we achieved a test accuracy of 99% which is not even close to the target test accuracy to achieve. We plan to improve the test accuracy by doing the following approaches:

Split the process into two different networks by varying the number of parameters.

Keep the same network architecture, but removing the fully connected layers.
Reduce the number of parameters in the model.

In each of these approaches, we performed the following changes in order to assist in achieving the target test accuracy.

Add Dropout layers
Perform Data Augmentation
Use Learning Rate Schedulers

Less than 10k parameters

Target	Results	Analysis
Remove FC layer	Parameters: 9,060 Train Accuracy: 99.45% Test Accuracy:99.33%	We removed the fully connected layers from the base network architecture. This increased the test accuracy by ~0.15%. However, we did not reach the target test accuracy. Next we add dropout layers to the network to check if it improves the test accuracy.
Add Dropout Layer	Parameters: 9,060 Train Accuracy: 99.06% Test Accuracy:99.29%	Addition of Dropout layer with dropout value of 0.043, helps improve the test accuracy. However, we did not reach the target test accuracy. Next we add data augmentation while training the network and check the increase in the test accuracy.
Data Augmentation	Parameters: 9,060 Train Accuracy: 98.88% Test Accuracy:99.39%	Addition of Data augmentation (Random Rotation) increased the chances of reaching the target accuracy of 99.4%. We hit validation accuracy of 99.39 in the last two epochs. In order to maintain a consistent accuracy of 99.4% in few more epochs we next add the Learning Rate scheduler. We observed that the accuracy was alternating between two ranges every 2 epochs. One reason could be that the network was not able to converge and reach a minima. Using LRScheduler provides us a better chance of reaching minima as the step size reduces.
Learning Rate Scheduler	Parameters: 9,060 Train Accuracy: 99.12% Test Accuracy:99.47%	The StepLR helped in stabilizing the learning, by reducing the learning rate to 10% after every 6th Epochs. Why after every 6 epochs? We observed that the loss goes up and down from the 6th/7th epoch; so reducing the LR at that point makes the training stable.

Less than 8k parameters

Target	Results	Analysis
Reducing number of parameters	Parameters: 7,826 Train Accuracy: 99.51% Test Accuracy:99.22%	We achieved the required parameter count (< 8000). The model is still able to learn as we see that the training accuracy reaches 99.5% even with number of parameters < 8000. Next we introduce reqularization techniques such as DropOut or Data Augmentation to improve Test Accuracy, as we observe that the network is over fitting.
Add Dropout Layer	Parameters: 7,826 Train Accuracy: 99.25% Test Accuracy:99.29%	We add DropOut of 5% (i.e) 5 out of 100 weights are dropped after each layer during training. We observe that over-fitting does not occur and both the test loss and accuracy are higher than the training loss and accuracy. This shows that the model's learning capabilities can be increased further to get better test accuracies. Also, in the confusion matrix and incorrect identified images (in the notebook) we see that the model is confused between 0 and 6 as there are quite high incorrect prediction of 6 as 0s. Same behaviour is seen with 2 and 7. Next we add data augmentation techniques such as RandomRotation of ±15° while training.
Data Augmentation	Parameters: 7,826 Train Accuracy: 98.73% Test Accuracy:99.36%	Alongwith DropOut of 5% after each layer we increase our dataset by adding agumentation. We use RandomRotation that rotates iamges randomly between ±15°. We observed a decrease in the training accuracy whereas there is an increase in test accuracy which is a good sign. However the test loss is not stable and goes from 0.21 to 0.25 and does not converge. The learning rate of 0.01 may be too large at this point which is from epoch 6/7. The next step is to use Dynamic LR.
Learning Rate Scheduler	Parameters: 7,826 Train Accuracy: 99.17% Test Accuracy:99.48%	The StepLR helped to stabilize the learning by reducing the learning rate to 10% after every 6th Epochs. Why after every 6 epochs? We observed that the loss was bouncing up and down from 6th/7th epoch; reducing the LR at this point would make the training stable. We observe that the accuracy/loss in testing is bouncing heavily in the first 6 epochs and after 6 epochs the learning rate is decreased to 10% (=0.001) thus, the traning stabilizes. The loss and accuracy is stable and converges slowly from 99.39 at Epoch 7 to 99.48 by end of training. Therefore, the use of LR stabilizes the training process, but identifying the correct epoch to reduce LR is challenging and needs manual tuning with multiple runs. We achieved the final target of consistently getting 99.4% test accuracy for half the epochs, proving very good convergence of the model.

Receptive Field

Model 1

Layer No.	Operation	Input Feat.	Kernel Size	Stride	Output Feat.	Jump in	Jump Out	RF in	RF Out
1	Conv2d	28	3	1	26	1	1	1	3
2	Conv2d	26	3	1	24	1	1	3	5
3	MaxPool	24	2	2	12	1	2	5	6
4	Conv2d	12	3	1	10	2	2	6	10
5	Conv2d	10	3	1	8	2	2	10	14
6	Conv2d	8	3	1	6	2	2	14	18
7	Conv2d	6	3	1	4	2	2	18	22
8	GAP	4	4	1	1	2	2	22	28

Model 2

Layer No.	Operation	Input Feat.	Kernel Size	Padding	Stride	Output Feat.	Jump in	Jump Out	RF in	RF Out
1	Conv2d	28	3	1	1	28	1	1	1	3
2	Conv2d	28	3	1	1	28	1	1	3	5
3	MaxPool	28	2	0	2	14	1	2	5	6
4	Conv2d	14	3	0	1	12	2	2	6	10
5	Conv2d	12	3	0	1	10	2	2	10	14
6	Conv2d	10	3	0	1	8	2	2	14	18
7	Conv2d	8	3	0	1	6	2	2	18	22
8	Conv2d	6	3	0	1	4	2	2	22	26
9	GAP	4	4	0	1	1	2	2	26	32

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
ParamsLessThan10k		ParamsLessThan10k
ParamsLessThan8k		ParamsLessThan8k
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discover Network Architectures

Objective

Approach

Target

1. Bare Skeleton Model - To check whether model is training or not

Results

Analysis

Less than 10k parameters

Less than 8k parameters

Receptive Field

Model 1

Model 2

About

Releases

Packages

Contributors 3

Languages

EVA6-Group-15/discover-architectures

Folders and files

Latest commit

History

Repository files navigation

Discover Network Architectures

Objective

Approach

Target

1. Bare Skeleton Model - To check whether model is training or not

Results

Analysis

Less than 10k parameters

Less than 8k parameters

Receptive Field

Model 1

Model 2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages