Ever gone through a situation where you are implementing a research paper and wish for some petty scripts which could have made your life easier? Well, the aim of the repository is to bring all the appurtenances of ML (NLP/CV etc.) into one place and use them whenever you need them with a little tweak. I have added some basic scripts and will add more in due time.
-
tf-idf.py implements the standard tf-idf (term frequence - inverse document frequency) algorithm using sklearn (TfidfVectorizer), although you can use HashVectorizer for better speedup and scalability.
-
SVM.py implements Support Vector Machine algorithm on the data train.csv. The code first removes all the un-necessary features, converts the categorical/nominal features to numberical using one-hot encoding method and final training is done using LibSVM .