Prediction of semantically equivalent queries

This is the repository for the final project of the course INF582 - Introduction to Text Mining and NLP (2018-2019)

Authors:

Fabrizio Indirli, Dor Polikar, Simon Klotz

Instructions:

The code can be run using the following steps:

Getting the data:

Copy the train.csv and test.csv files into the data folder
Generate or copy GloVe vectors:
a. If not already done, download the GloVe 840B-300d file from here, put it in /data/ and convert it to word2vec format:
```
python -m gensim.scripts.glove2word2vec --input  ./data/glove.840B.300d.txt --output ./data/glove.840B.300d.w2vformat.txt
```
b. Otherwise copy already converted glove file glove.840B.300d.txt to ./data/ and rename it to glove.840B.300d.w2vformat.txt

Install required packages:

pip install -r requirements.txt

Computing the features:

Run: python ./build_features.py

Predicting:

To get the results using the LSTM:

Run: python ./lstm_model.py

The final submission is in the predictions folder and called postprocessed_submission.csv

To get the results using the ensemble (if the ensemble should include the LSTM first run the lstm_model.py):

Run: python ./cross_validation_ensemble.py
The final submission is in the predictions folder and called postprocessed_submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction of semantically equivalent queries

Authors:

Instructions:

Getting the data:

Install required packages:

Computing the features:

Predicting:

To get the results using the LSTM:

To get the results using the ensemble (if the ensemble should include the LSTM first run the lstm_model.py):

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
checkpoints		checkpoints
data		data
predictions		predictions
scripts		scripts
.gitignore		.gitignore
README.md		README.md
build_features.py		build_features.py
cross_validation_ensemble.py		cross_validation_ensemble.py
lstm_model.py		lstm_model.py
requirements.txt		requirements.txt

fabrizio-indirli/similar-questions-detection

Folders and files

Latest commit

History

Repository files navigation

Prediction of semantically equivalent queries

Authors:

Instructions:

Getting the data:

Install required packages:

Computing the features:

Predicting:

To get the results using the LSTM:

To get the results using the ensemble (if the ensemble should include the LSTM first run the lstm_model.py):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages