Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilabel Document Categorization #8

Open
geegatomar opened this issue Sep 30, 2021 · 0 comments
Open

Multilabel Document Categorization #8

geegatomar opened this issue Sep 30, 2021 · 0 comments
Labels
Hacktoberfest Intelligence Problem statements which deal with Machine Learning/AI intermediate

Comments

@geegatomar
Copy link

Description

The goal is to create a topic classifier using unsupervised machine learning techniques.

Details

Take the following example:
“One tyre went missing, so there was a delay to get the two tyres fitted. The garage I dealt with were fantastic.”
In this review there are numerous insights, insights we call “topics”. A Topic, as the name suggests, essentially describes the concise meaning of the text.
If we look at the topics of the above review we will get a clearer sense of what these generally are.
(incorrect tyres) (garage service) (wait time)

You are provided with a dataset with such reviews, and your task is to perform unsupervised topic modelling.

“Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.”

A few examples for the kind of topics you can generate for the given dataset are: Value for money, ease of booking, garage service, etc.

For example-
If you have a phrase cluster like:
[great service, good garage service, professional service, friendly garage]
The above cluster is very relevant to the description of the topic: garage service and this phrase/word cluster can be used to label all reviews containing any of these phrases as garage service.

Given the input dataset, you need to generate relevant topics using any unsupervised ML technique.

Issue requirements / progress

You are expected to submit the code in either a jupyter notebook or colab notebook.
The solution will be judged on Data Analysis, Approach (NLP Techniques used), Code (Following good coding practices and mentioning), and any supporting documents submitted.

Resources

Reading up about NLP techniques, data cleaning and analysis, and various unsupervised techniques for topic modeling would be useful.

Directory Structure

Create a directory under the Intelligence folder and submit the jupyter notebook or colab notebook, along with any supporting documents (summary, explanation, motivation of approach).

Note

  1. Please claim the issue first by commenting here before starting to work on it.
  2. Once you are done with the task and have created a Pull Request, please tag @geegatomar to request a review.
@ikjot-2605 ikjot-2605 added Hacktoberfest Intelligence Problem statements which deal with Machine Learning/AI intermediate labels Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hacktoberfest Intelligence Problem statements which deal with Machine Learning/AI intermediate
Projects
None yet
Development

No branches or pull requests

2 participants