Adaptive Softmax for Keras

Keras implementations (requiring TensorFlow backend) of Adaptive Softmax¹ and a variation of Differentiated Softmax^1,2. These alternatives to standard softmax exploit differences in word frequencies to substantially reduce neural language model training time.

Installation

General Use

Run the following, ideally from a virtualenv:

pip install git+https://github.com/johntrimble/adaptive-softmax-keras.git#egg=adaptive-softmax-keras

Development

Run the following, ideally from a virtualenv:

git clone https://github.com/johntrimble/adaptive-softmax-keras.git
cd adaptive-softmax-keras
pip install --requirement requirements.txt
pip install --editable .

Performance Comparison

The above compares perplexity over time for full, adaptive, and differentiated softmax on the text8 dataset over 10 training iterations. Each point represents a completed epoch. Note that adaptive softmax takes less than half the training time of full softmax to achieve the same perplexity score. See examples/text8_benchmark.py for further details.

References

Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, Efficient softmax approximation for GPUs
Welin Chen, David Grangier, Michael Auli, Strategies for Training Large Vocabulary Neural Language Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Adaptive Softmax for Keras

Installation

General Use

Development

Performance Comparison

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Adaptive Softmax for Keras

Installation

General Use

Development

Performance Comparison

References