In this project, I used dataset which had been collected from Azerbaijani news portals. The dataset consists of 50000 news having 6 distinct categories. Using the Natural Language Processing (NLP) methods, I firstly represented the texts as numerical data and then developed and optimized models to classify the category of news in Azerbaijani language. In the first part of the project, I used Bag of Words (BoW) method to vectorize the dataset and then used several Machine Learning algorithm to train different classification models. As this dataset is labeled, I trained Supervised ML algorithms, namely, Decision Tree Classifier, Naive Bayes, Support Vector Classifier (SVC), and Artificial Neural Network. For the comparison purpose of this project, I used another NLP technique which was Term Frequency and Inverse Document Frequency (TF-IDF) to vectorize data and did the same modeling tasks.
-
Notifications
You must be signed in to change notification settings - Fork 0
RovshanBayramRB/Azerbaijani-News-Classification
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published