From f7d55ffff3d585f0128634f906fb754ab87c7be0 Mon Sep 17 00:00:00 2001 From: Oxer11 <17300240035@fudan.edu.cn> Date: Sun, 12 May 2019 13:40:46 +0800 Subject: [PATCH 1/2] some corrections --- docs/assignment-3/index.html | 66 ++++++++++++++++++------------------ docs/assignment-3/index.md | 10 +++--- 2 files changed, 38 insertions(+), 38 deletions(-) diff --git a/docs/assignment-3/index.html b/docs/assignment-3/index.html index 153932e..2a6bf6c 100644 --- a/docs/assignment-3/index.html +++ b/docs/assignment-3/index.html @@ -4,21 +4,20 @@
In this assignment you are going to implement a RNN (namely LSTM) for generating Tang poetry. This assignment description will outline the landscape for you to know how to do it! You'll also get yourself familiar with PyTorch and FastNLP once you complete this assignment, their docs are very recommended for you to get started, and you could also try out some examples included in their code repository.
In the previous assignment, you've already implemented back-propagation of gradients with numpy, you must have had a lot of fun playing with it. Although nowadays autograd tools like Tensorflow and PyTorch are pervasive, and people rarely write deep neural networks without them, not only because they provided great convenience over gradient computation, but also could they leverage GPUs for amazingly fast training, knowing the details under the hook should be very beneficial if you want to dive deeper into deep learning and these are very frequently asked during interviews.
In the course, we talked about Recursive Neural Network, and one of its mostly used variation, LSTM (Long-Short Term Memory 1) network. To remind your how LSTM works, the LSTM unit processing the input in the following manner under the hook
where stands for matrix multiplication, for dot production and for vector concatenation. Note and are parameters of the LSTM that is the same throughout all steps.
Also note that here the input is a vector while in your implementation, please use batched input as matrix multiplication on matrix is the same as multiplying vectors concatenated horizontally.
For language modeling, we use LSTM to predict the next word or character at each step. For example, if we have a sentence for the input at each step for the LSTM, the output at each step should be where EOS stands for end of sentence. To obtain an prediction from LSTM, we first create an vocabulary to map each word to an integer which is an ordered set that contains all the word in your training dataset, and then we could map each integer to an vector which will be the input for the LSTM. Then we rely on the hidden vector, at each step t, we can use a linear transformation where is a vector of size . Because linear transformation results in value that is unbounded, to make prediction a probability distribution, we first take exponential and then normalize it with the sum e.g. take the softmax of , , where is the temperature term that is usually 1, you will encounter this term later. As we've learned from the previous assignment, we could use cross entropy loss to urge the prediction to be the next word , and we could try to minimize the average loss to provide the training signal for the network.
Requirements
In this part you are going to implement an LSTM to build a language model to generate Tang poetry.
You are given a small dataset containing some Tang poems, you first split the dataset to a training dataset and development dataset, we would recommend an 80% and 20% split. Then you create a vocabulary containing all the words (or characters, but we stick to use words to refer to them) in the training dataset, be aware that you might want to insert a new word EOS
and a special token OOV
for unknown word (or known as out-of-vocabulary word). To process the dataset, you should transform the poems into a sequence of integer representing words in the vocabulary. Then you could randomly crop the sequence into batches of short sequences for the training of the LSTM. Note that at each step a single input of the LSTM should be a vector, we should create a mapping from integers to vectors, this step is also known as embedding in NLP. You are encouraged to use vocabulary, dataset from FastNLP to implemented yours vocabulary.
Follow the previous discussion we could come to a loss function that could provide gradient to the parameters and also the embedding (as you could either fix the embedding to its initialization or update it with the gradient).
As the model is pretty clear here, you should include the hyperparameter and training setting your are using in your report. They are
The training of the model stops when it could not get better in predicting the next word on the development dataset, which could be evaluated by perplexity
where stands for matrix multiplication, for element-wise production and for vector concatenation. Note and are parameters of the LSTM that is the same throughout all steps.
Also note that here the input is a vector while in your implementation, please use batched input as matrix multiplication on matrix is the same as multiplying vectors concatenated horizontally.
For language modeling, we use LSTM to predict the next word or character at each step. For example, if we have a sentence for the input at each step for the LSTM, the output at each step should be where EOS stands for end of sentence. To obtain an prediction from LSTM, we first create an vocabulary to map each word to an integer which is an ordered set that contains all the word in your training dataset, and then we could map each integer to an vector which will be the input for the LSTM. Then we rely on the hidden vector, at each step t, we can use a linear transformation where is a vector of size . Because linear transformation results in value that is unbounded, to make prediction a probability distribution, we first take exponential and then normalize it with the sum e.g. take the softmax of , , where is the temperature term that is usually 1, you will encounter this term later. As we've learned from the previous assignment, we could use cross entropy loss to urge the prediction to be the next word , and we could try to minimize the average loss to provide the training signal for the network.
Requirements
In this part you are going to implement an LSTM to build a language model to generate Tang poetry.
You are given a small dataset containing some Tang poems, you first split the dataset to a training dataset and development dataset, we would recommend an 80% and 20% split. Then you create a vocabulary containing all the words (or characters, but we stick to use words to refer to them) in the training dataset, be aware that you might want to insert a new word EOS
and a special token OOV
for unknown word (or known as out-of-vocabulary word). To process the dataset, you should transform the poems into a sequence of integer representing words in the vocabulary. Then you could randomly crop the sequence into batches of short sequences for the training of the LSTM. Note that at each step a single input of the LSTM should be a vector, we should create a mapping from integers to vectors, this step is also known as embedding in NLP. You are encouraged to use vocabulary, dataset from FastNLP to implemented yours vocabulary.
Follow the previous discussion we could come to a loss function that could provide gradient to the parameters and also the embedding (as you could either fix the embedding to its initialization or update it with the gradient).
As the model is pretty clear here, you should include the hyperparameter and training setting your are using in your report. They are
The training of the model stops when it could not get better in predicting the next word on the development dataset, which could be evaluated by perplexity
The perplexity should be evaluated on the whole development dataset, which is to split the dataset by length which is the sentence length used in the training stage, and then evaluate the average perplexity on all the split sentences. Use early stop when perplexity don't improve. You should try trainer.py from FastNLP to this end, as early stop are already implemented in it.
To generate a Tang poem once you got the model trained, you could first sample a word to start and then use it as input to the LSTM, and them sample from the output of the LSTM and in turn send the generated word into the LSTM to have the next word generated. To allow more variation, sometimes people use a temperature term in the sofmax to control the diversity of generation, for example use to make it more variant than .
Above all, you might find this artical great to help understand the task, where the author implemented a vanilla RNN language model to generate not only poems, but also linux kernel code.
Requirements
Clarification
The perplexity should be evaluated on the whole development dataset, which is to split the dataset by length which is the sentence length used in the training stage, and then evaluate the average perplexity on all the split sentences. Use early stop when perplexity don't improve. You should try trainer.py from FastNLP to this end, as early stop are already implemented in it.
To generate a Tang poem once you got the model trained, you could first sample a word to start and then use it as input to the LSTM, and them sample from the output of the LSTM and in turn send the generated word into the LSTM to have the next word generated. To allow more variation, sometimes people use a temperature term in the sofmax to control the diversity of generation, for example use to make it more variant than .
Above all, you might find this article great to help understand the task, where the author implemented a vanilla RNN language model to generate not only poems, but also linux kernel code.
Requirements
Clarification