This is the repository for the final project of the course INF582 - Introduction to Text Mining and NLP (2018-2019)
Fabrizio Indirli, Dor Polikar, Simon Klotz
The code can be run using the following steps:
- Copy the train.csv and test.csv files into the data folder
- Generate or copy GloVe vectors:
a. If not already done, download the GloVe 840B-300d file from here, put it in /data/ and convert it to word2vec format:b. Otherwise copy already converted glove file glove.840B.300d.txt to ./data/ and rename it to glove.840B.300d.w2vformat.txtpython -m gensim.scripts.glove2word2vec --input ./data/glove.840B.300d.txt --output ./data/glove.840B.300d.w2vformat.txt
pip install -r requirements.txt
Run: python ./build_features.py
Run: python ./lstm_model.py
The final submission is in the predictions folder and called postprocessed_submission.csv
To get the results using the ensemble (if the ensemble should include the LSTM first run the lstm_model.py):
Run: python ./cross_validation_ensemble.py
The final submission is in the predictions folder and called postprocessed_submission.csv