Keras implementations (requiring TensorFlow backend) of Adaptive Softmax1 and a variation of Differentiated Softmax1,2. These alternatives to standard softmax exploit differences in word frequencies to substantially reduce neural language model training time.
Run the following, ideally from a virtualenv:
pip install git+https://github.com/johntrimble/adaptive-softmax-keras.git#egg=adaptive-softmax-keras
Run the following, ideally from a virtualenv:
git clone https://github.com/johntrimble/adaptive-softmax-keras.git
cd adaptive-softmax-keras
pip install --requirement requirements.txt
pip install --editable .
The above compares perplexity over time for full, adaptive, and differentiated softmax on the text8 dataset over 10 training iterations. Each point represents a completed epoch. Note that adaptive softmax takes less than half the training time of full softmax to achieve the same perplexity score. See examples/text8_benchmark.py for further details.
-
Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou, Efficient softmax approximation for GPUs
-
Welin Chen, David Grangier, Michael Auli, Strategies for Training Large Vocabulary Neural Language Models