Fake New Challenge : FakeNewsChallenge.org.
This project start from provided baseline on github.
python >= 3.7.0 (tested with 3.7.2)
-
Install required python packages.
pip install -r requirements.txt --upgrade
-
Parts of the Natural Language Toolkit (NLTK) might need to be installed manually.
python3 -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"
-
In order to reproduce the same results, please use our features and models. The features and models have been generated. If you want to reproduce it, delete all files in
features
andmodels
directory. Keep them, you can skip to 6. -
To generate name entity feature, you need to run CoreNLP server version 3.9.2: Download Stanford CoreNLP, extract anywhere and execute following command in corenlp directory (It takes about 5 hours on dev enviroment to generate):
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9020
-
To generate doc2vec feature, you need 2 paragraph vector models in models directory name
h_d2v.model
andb_d2v.model
. You can generate from doc2vecModelGenerator.py or use the one we've already generated. -
To run and generate the model (if features or models do not exist, the script will generate them automatically).
python3 FinalClassifier.py
-
XGBoostClassifier.py
is the old version of project which classify 4 classes by only a XGBoost model.