Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 1.83 KB

README.md

File metadata and controls

36 lines (22 loc) · 1.83 KB

Adaptive Softmax for Keras

Keras implementations (requiring TensorFlow backend) of Adaptive Softmax1 and a variation of Differentiated Softmax1,2. These alternatives to standard softmax exploit differences in word frequencies to substantially reduce neural language model training time.

Installation

General Use

Run the following, ideally from a virtualenv:

pip install git+https://github.com/johntrimble/adaptive-softmax-keras.git#egg=adaptive-softmax-keras

Development

Run the following, ideally from a virtualenv:

git clone https://github.com/johntrimble/adaptive-softmax-keras.git
cd adaptive-softmax-keras
pip install --requirement requirements.txt
pip install --editable .

Performance Comparison

Softmax comparison

The above compares perplexity over time for full, adaptive, and differentiated softmax on the text8 dataset over 10 training iterations. Each point represents a completed epoch. Note that adaptive softmax takes less than half the training time of full softmax to achieve the same perplexity score. See examples/text8_benchmark.py for further details.

References

  1. Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, Efficient softmax approximation for GPUs

  2. Welin Chen, David Grangier, Michael Auli, Strategies for Training Large Vocabulary Neural Language Models