Story Cloze Test - NLU Project 2

The goal of this project is to predict the right ending of a 4-sentence story among two alternatives.

Data

The following datasets were provided:

training set: containing 88161 five-sentence short stories that include only the correct ending
validation set: containing 1871 stories with positive and negative endings
test set: containing stories with two endings
cloze test set: additional test set with right ending labels

and they are located in nlu_project_2/data/.

Our Models

To predict the right endings, we experimented with the following models:

CNN LSTM: cnn_lstm, cnn_lstm_val
Siamese LSTM: siameseLSTM
Feed-forward neural network: ffnn, ffnn_val

Those with suffix _val are the corresponding models which train on validation data (splitted into 90% for training and 10% for validation).

For details related to models and methodology, please refer to the report in nlu_project_2/report.

Getting Started

Virtual environment

Create a new conda virtual environment with required packages

conda env create -n nlu_project -f=/path/to/requirements.txt

Activate the virtual environment

source activate nlu_project

Preprocess data

You can generate preprocessed data with pos tags (needed for some models) by running:

python preprocessing.py

To get augmented training data with wrong endings randomly sampled from the context, run:

python negative_endings.py

and it will create a file nlu_project_2/data/train_set_sampled.csv

You can also download the preprocessed data from this link (or alternatively, here). Then you need to copy the files in nlu_project_2/data/ to run the models directly.

Pre-trained skip-thought embeddings

For the feed-forward neural network, we used pre-trained skip-thoughts embeddings from ryankiros. You need to download the embedding files as specified in the project's readme:

wget http://www.cs.toronto.edu/~rkiros/models/dictionary.txt
wget http://www.cs.toronto.edu/~rkiros/models/utable.npy
wget http://www.cs.toronto.edu/~rkiros/models/btable.npy
wget http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz
wget http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz.pkl
wget http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz
wget http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz.pkl

and copy the files in nlu_project_2/src/models/skip_thoughts/data.

Please note that we modified his code to make it work for our project.

Running

To train our models, run:

python run.py -m model-name -t

where model-name refers to one of our models, namely cnn_lstm, cnn_lstm_val, siameseLSTM, ffnn, ffnn_val.

The models are saved after every epoch in nlu_project_2/trained_models/model-name/date[hour]/model.h5.

You can find the pretrained models and the prediction files for test set in this folder.

Please note that for evaluation and prediction, it will retrieve the last trained model. If you would like to test on our pretrained models, we suggest you to do prediction first, and then do training / testing again to verify our model.

To evaluate our trained models on the cloze test set, run:

python run.py -m model-name -e

To predict the endings for the given test set, run:

python run.py -m model-name -p

It will generate a csv file with the predicted right ending labels in nlu_project_2/trained_models/model-name/date[hour] for the last trained model.

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
data		data
papers		papers
report		report
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Story Cloze Test - NLU Project 2

Data

Our Models

Getting Started

Virtual environment

Preprocess data

Pre-trained skip-thought embeddings

Running

About

Releases

Packages

Contributors 3

Languages

robertah/nlu_project_2

Folders and files

Latest commit

History

Repository files navigation

Story Cloze Test - NLU Project 2

Data

Our Models

Getting Started

Virtual environment

Preprocess data

Pre-trained skip-thought embeddings

Running

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages