Sequence to Sequence Learning with Neural Networks

TLDR; The authors show that a multilayer LSTM RNN (4 layers, 1000 cells per layer, 1000d embeddings, 160k source vocab, 80k target vocab) can achieve competitive results on Machine Translation tasks. The authors find that reversing the input sequence leads to significant improvements, most likely due to the introduction of short-term dependencies that are more easily captured by the gradients. Somewhat surprisingly, the LSTM did not have difficulties on long sentences. The model is evaluated on MT tasks and achieves competitive results (34.8 BLEU) by itself, and close to state of the art if coupled with existing baseline systems (36.5 BLEU).

Key Points

Invert input sequence leads to significant improvement
Deep LSTM performs much better than shallow LSTM.
User different parameters for encoder/decoder. This allows parallel training for multiple languages decoders.
4 Layers, 1000 cells per layer. 1000-dimensional words embeddings. 160k source vocabulary. 80k target vocabulary.Trained on 12M sentences (652M words). SGD with fixed learning rate of 0.7, decreasing by 1/2 every epoch after 5 initial epochs. Gradient clipping. Parallelization on GPU leads to 6.3k words/sec.
Batching sentences of approximately the same length leads to 2x speedup.
PCA projection shows meaningful clusters of sentences robust to passive/active voice, suggesting that the fixed vector representation captures meaning.
"No complete explanation" for why the LSTM does so much better with the introduced short-range dependencies.
Beam size 1 already performs well, beam size 2 is best in deep model.

Notes/Questions

Seems like the performance here is mostly due to the computational resources available and optimized implementation. These models are pretty big by most standards, and other approaches (e.g. attention) may lead to better results if they had more computational resources.
Reversing the input still feels like a hack to me, there should be a more principled solution to deal with long-range dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seq2seq-with-neural-networks.md

seq2seq-with-neural-networks.md

Sequence to Sequence Learning with Neural Networks

Key Points

Notes/Questions

Files

seq2seq-with-neural-networks.md

Latest commit

History

seq2seq-with-neural-networks.md

File metadata and controls

Sequence to Sequence Learning with Neural Networks

Key Points

Notes/Questions