To run the code in this repository, Node (at least version 10) is needed. Additionally, to run the predictor workers, make sure that Python (at least version 3.5) is installed and that you have the dependencies (i.e. run pip3 install -r workers/predictors-bayesian/requirements.txt
and pip3 install -r workers/predictors-neural/requirements.txt
). The code in this repository was only tested using Ubuntu 16.04.
Please run npm install
after cloning the repository to install all dependencies or when the dependencies changed after pulling. Afterwards, use Visual Studio Code as your IDE to immediately start working with ESLint and Prettier being directly integrated then.
Note that this repository consists of two different models which both aim for predicting the likelihood of death/survival of GoT characters. Their usage is explained in the following.
The Bayesean model can be used as follows:
- If you need to, refetch the data by running
./refetch.sh
indata/book
anddata/show
. - Run
node workers/formatter-bayesean-book
andnode workers/formatter-bayesean-show
. They will read out the features for training used for data and will generate a JSON file in their own directory (training_book_characters.json
ortraining_show_characters.json
). - Run the predictor scripts in
workers/predictors-bayesian/predictor-bayesean-book
andworkers/predictors-bayesian/predictor-bayesean-show
. This can be done directly (python3 workers/predictors-bayesian/predictor-bayesean-book/predictor.py
) or using Node (node workers/predictors-bayesian/predictor-bayesean-book
). - The predictors will produce an output JSON in their own directory (
book_predictor_output.json
,show_predictor_output.json
). Run the postprocessors to filter out dead characters and the unnecessary data:node workers/postprocessor-bayesean-book
,node workers/postprocessor-bayesean-show
. - To upload the predictions to the website, use
node workers/uploader-predictions-bayesean
. To upload only the attributes used and their average influences, usenode workers/uploader-attributes-bayesean
.
For creating the book predictions yourself, several steps are needed:
- Format the data into an intermediate JSON format by running
node workers/formatter
. - Create a zlib-inflated chunk of neural network data by running
node workers/formatter-neural/index-v2.js
. - Edit the file
workers/predictors-neural/predictor-neural-v1/predictor.py
to haveif True:
in line 28, then run it using./predictor.py
. - Change that line back to
if False:
, then run that script again using./predictor.py
. The final predictions can now be found inworkers/predictors-neural/predictor-neural-v2/output/predictions.json
. - To upload the predictions to the website, use
node workers/uploader-predictions
.
The process for creating the show predictions is almost identical, just use the formatter-show
, formatter-neural-show
and predictors-neural/predictor-neural-show-v1
worker directories, in that order.
To create a new branch to add your changes to, please execute the following commands and replace my-new-branch
by the desired name of your branch.
git checkout master
git checkout -b my-new-branch
git push origin my-new-branch
git push --set-upstream origin my-new-branch
- book
- show
- number of characters: 484
- used for training (i.e. dead): 188, predicted on (i.e. alive): 296
- number of training datapoints: 18800
- used for training itself: 15040, used for validation: 3760
- final training accuracy: 88.75%, final validation accuracy: 89.92% (from Keras log)
- number of dimensions per datapoint: 1561
- scalar values
- male: 1, page rank (normalized): 1, number of relatives (normalized): 1
- one hot vectors
- age: 100, culture: 57, house: 360, house region: 29
- multiple hot vectors
- allegiances: 396, books: 19, locations: 82, titles: 515
- scalar values
- number of output dimensions: 1
- 1.0 if alive, 0.0 otherwise
- number of characters: 146
- used for training (i.e. dead): 82, predicted on (i.e. alive): 64
- number of training datapoints: 7052
- used for training itself: 6346, used for validation: 706
- final training accuracy: 81.00%, final validation accuracy: 84.56% (from Keras log)
- number of dimensions per datapoint: 413
- scalar values
- male: 1, is bastard: 1, page rank (normalized): 1, number of relatives (normalized): 1, number of commanded battles (normalized): 1
- one hot vectors
- age: 86
- multiple hot vectors
- allegiances: 130, appearances: 74, titles: 118
- scalar values
- number of output dimensions: 1
- 1.0 if alive, 0.0 otherwise