Skip to content

fabrizio-indirli/similar-questions-detection

Repository files navigation

Prediction of semantically equivalent queries

This is the repository for the final project of the course INF582 - Introduction to Text Mining and NLP (2018-2019)

Authors:

Fabrizio Indirli, Dor Polikar, Simon Klotz

Instructions:

The code can be run using the following steps:

Getting the data:

  1. Copy the train.csv and test.csv files into the data folder
  2. Generate or copy GloVe vectors:
    a. If not already done, download the GloVe 840B-300d file from here, put it in /data/ and convert it to word2vec format:
    python -m gensim.scripts.glove2word2vec --input  ./data/glove.840B.300d.txt --output ./data/glove.840B.300d.w2vformat.txt
    
    b. Otherwise copy already converted glove file glove.840B.300d.txt to ./data/ and rename it to glove.840B.300d.w2vformat.txt

Install required packages:

pip install -r requirements.txt

Computing the features:

Run: python ./build_features.py

Predicting:

To get the results using the LSTM:

Run: python ./lstm_model.py

The final submission is in the predictions folder and called postprocessed_submission.csv

To get the results using the ensemble (if the ensemble should include the LSTM first run the lstm_model.py):

Run: python ./cross_validation_ensemble.py
The final submission is in the predictions folder and called postprocessed_submission.csv

About

Prediction of semantically equivalent queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages