GitHub - mystic123/nn-optimizers-numpy

Simple examples of optimizers implementation in numpy.

I based my implementation on examples from Deep Learning Specialization (Coursera) and this tutorial.

Great overview about optimization algorithms is here.

Results:

From this simple example, we can conclude:

Not-adaptive algorithms (SGD, Momentum, NAG) need high learning rate for this task. With small learning rates, progress is slow.
Adaptive algorithms (Adam, Adagrad, RMSProp) fail (diverge) with high learning rates
Best results are achieved by Adam, RMSProp and Adagrad (depending on lr)