stumbleupon

Kaggle submission for a Stumbleupon Evergreen classification challenge, good for 30th place out of 625 entries https://www.kaggle.com/c/stumbleupon

Competition's goal is to predict whether a given website is viral over long periods of time (Evergreen). My approach focuses solely on the text of the articles as an exercise to practice NLP techniques. It is likely possible to further improve this score using additional information provided in the dataset.

To train: clone the repository, pip install -r requiements.txt for the necessary libraries, then type python main.py in the terminal to train the model. This will produce the submission_final.csv which can be scored by sending the .csv to https://www.kaggle.com/c/stumbleupon/submissions under "Make a Submission"

Model uses NLP techniques of tf-idf with word stemming, Latent Semantic Analysis for feature selection, then is trained via propritary ModelEnsemble class, which uses Stacked Generalization to weigh the model importances of Logistic Regression and Gradient Boosted classifiers. The ModelEnsemble takes advantage of different decision bounds from different algorithms in an attempt to maximize against scoring metric, AUC.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
test-all.csv		test-all.csv
test_text.csv		test_text.csv
train-all.csv		train-all.csv
train_text.csv		train_text.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stumbleupon

About

Releases

Packages

Languages

License

dylanjf/stumbleupon

Folders and files

Latest commit

History

Repository files navigation

stumbleupon

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages