Skip to content

johntrimble/adaptive-softmax-keras

Repository files navigation

Adaptive Softmax for Keras

Keras implementations (requiring TensorFlow backend) of Adaptive Softmax1 and a variation of Differentiated Softmax1,2. These alternatives to standard softmax exploit differences in word frequencies to substantially reduce neural language model training time.

Installation

General Use

Run the following, ideally from a virtualenv:

pip install git+https://github.com/johntrimble/adaptive-softmax-keras.git#egg=adaptive-softmax-keras

Development

Run the following, ideally from a virtualenv:

git clone https://github.com/johntrimble/adaptive-softmax-keras.git
cd adaptive-softmax-keras
pip install --requirement requirements.txt
pip install --editable .

Performance Comparison

Softmax comparison

The above compares perplexity over time for full, adaptive, and differentiated softmax on the text8 dataset over 10 training iterations. Each point represents a completed epoch. Note that adaptive softmax takes less than half the training time of full softmax to achieve the same perplexity score. See examples/text8_benchmark.py for further details.

References

  1. Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, Efficient softmax approximation for GPUs

  2. Welin Chen, David Grangier, Michael Auli, Strategies for Training Large Vocabulary Neural Language Models

About

A Keras implementation of Adaptive Softmax

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published