Detect Toxic Content

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

The challenge is to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. The model(s) will hopefully help online discussion become more productive and respectful.

SETUP:

Download data (train.csv) from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data
Download GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip and unpack them
Parametrize above data files (DATA_FILE, GLOVE_FILE) in config.py
Install packages from requirements.txt
To better understand the data, have a look at exploratory data analysis. In command line:

jupyter notebook exlopratory_data_analysis.ipynb

To fit models, run main script from command line:

python main.py --choose-model=MODEL

where MODEL is

for Bag of Words
for Latent Dirichlet Allocation
for Long Short-Term Memory

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data_loading.py		data_loading.py
exploratory_data_analysis.ipynb		exploratory_data_analysis.ipynb
features_engineering.py		features_engineering.py
lstm.py		lstm.py
main.py		main.py
nb_svm.py		nb_svm.py
snippets.py		snippets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detect Toxic Content

About

Releases

Packages

Languages

datali-ch/detect-toxic-content

Folders and files

Latest commit

History

Repository files navigation

Detect Toxic Content

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages