FNC-1 COMP9417 project

Fake New Challenge : FakeNewsChallenge.org.

This project start from provided baseline on github.

Requirements

    python >= 3.7.0 (tested with 3.7.2)

Install required python packages.

pip install -r requirements.txt --upgrade

Parts of the Natural Language Toolkit (NLTK) might need to be installed manually.

python3 -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"

In order to reproduce the same results, please use our features and models. The features and models have been generated. If you want to reproduce it, delete all files in features and models directory. Keep them, you can skip to 6.
To generate name entity feature, you need to run CoreNLP server version 3.9.2: Download Stanford CoreNLP, extract anywhere and execute following command in corenlp directory (It takes about 5 hours on dev enviroment to generate):
```
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9020
```
To generate doc2vec feature, you need 2 paragraph vector models in models directory name h_d2v.model and b_d2v.model. You can generate from doc2vecModelGenerator.py or use the one we've already generated.
To run and generate the model (if features or models do not exist, the script will generate them automatically).
```
python3 FinalClassifier.py
```
XGBoostClassifier.py is the old version of project which classify 4 classes by only a XGBoost model.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
__pycache__		__pycache__
features		features
fnc-1		fnc-1
models		models
splits		splits
utils		utils
.gitignore		.gitignore
COMP 9417 Final Project report.zip		COMP 9417 Final Project report.zip
FinalClassifier.py		FinalClassifier.py
LICENSE		LICENSE
README.md		README.md
XGBoostClassifier.py		XGBoostClassifier.py
doc2vecModelGenerator.py		doc2vecModelGenerator.py
feature_engineering.py		feature_engineering.py
report.pdf		report.pdf
requirements.txt		requirements.txt
results.txt		results.txt