Multilabel Document Categorization #8

geegatomar · 2021-09-30T13:30:37Z

Description

The goal is to create a topic classifier using unsupervised machine learning techniques.

Details

Take the following example:
“One tyre went missing, so there was a delay to get the two tyres fitted. The garage I dealt with were fantastic.”
In this review there are numerous insights, insights we call “topics”. A Topic, as the name suggests, essentially describes the concise meaning of the text.
If we look at the topics of the above review we will get a clearer sense of what these generally are.
(incorrect tyres) (garage service) (wait time)

You are provided with a dataset with such reviews, and your task is to perform unsupervised topic modelling.

“Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.”

A few examples for the kind of topics you can generate for the given dataset are: Value for money, ease of booking, garage service, etc.

For example-
If you have a phrase cluster like:
[great service, good garage service, professional service, friendly garage]
The above cluster is very relevant to the description of the topic: garage service and this phrase/word cluster can be used to label all reviews containing any of these phrases as garage service.

Given the input dataset, you need to generate relevant topics using any unsupervised ML technique.

Issue requirements / progress

You are expected to submit the code in either a jupyter notebook or colab notebook.
The solution will be judged on Data Analysis, Approach (NLP Techniques used), Code (Following good coding practices and mentioning), and any supporting documents submitted.

Resources

Reading up about NLP techniques, data cleaning and analysis, and various unsupervised techniques for topic modeling would be useful.

Directory Structure

Create a directory under the Intelligence folder and submit the jupyter notebook or colab notebook, along with any supporting documents (summary, explanation, motivation of approach).

Note

Please claim the issue first by commenting here before starting to work on it.
Once you are done with the task and have created a Pull Request, please tag @geegatomar to request a review.

ikjot-2605 added Hacktoberfest Intelligence Problem statements which deal with Machine Learning/AI intermediate labels Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilabel Document Categorization #8

Multilabel Document Categorization #8

geegatomar commented Sep 30, 2021

Multilabel Document Categorization #8

Multilabel Document Categorization #8

Comments

geegatomar commented Sep 30, 2021

Description

Details

Issue requirements / progress

Resources

Directory Structure

Note