You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal is to create a topic classifier using unsupervised machine learning techniques.
Details
Take the following example:
“One tyre went missing, so there was a delay to get the two tyres fitted. The garage I dealt with were fantastic.”
In this review there are numerous insights, insights we call “topics”. A Topic, as the name suggests, essentially describes the concise meaning of the text.
If we look at the topics of the above review we will get a clearer sense of what these generally are.
(incorrect tyres) (garage service) (wait time)
You are provided with a dataset with such reviews, and your task is to perform unsupervised topic modelling.
“Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.”
A few examples for the kind of topics you can generate for the given dataset are: Value for money, ease of booking, garage service, etc.
For example-
If you have a phrase cluster like:
[great service, good garage service, professional service, friendly garage]
The above cluster is very relevant to the description of the topic: garage service and this phrase/word cluster can be used to label all reviews containing any of these phrases as garage service.
Given the input dataset, you need to generate relevant topics using any unsupervised ML technique.
Issue requirements / progress
You are expected to submit the code in either a jupyter notebook or colab notebook.
The solution will be judged on Data Analysis, Approach (NLP Techniques used), Code (Following good coding practices and mentioning), and any supporting documents submitted.
Resources
Reading up about NLP techniques, data cleaning and analysis, and various unsupervised techniques for topic modeling would be useful.
Directory Structure
Create a directory under the Intelligence folder and submit the jupyter notebook or colab notebook, along with any supporting documents (summary, explanation, motivation of approach).
Note
Please claim the issue first by commenting here before starting to work on it.
Once you are done with the task and have created a Pull Request, please tag @geegatomar to request a review.
The text was updated successfully, but these errors were encountered:
Description
The goal is to create a topic classifier using unsupervised machine learning techniques.
Details
Take the following example:
“One tyre went missing, so there was a delay to get the two tyres fitted. The garage I dealt with were fantastic.”
In this review there are numerous insights, insights we call “topics”. A Topic, as the name suggests, essentially describes the concise meaning of the text.
If we look at the topics of the above review we will get a clearer sense of what these generally are.
(incorrect tyres) (garage service) (wait time)
You are provided with a dataset with such reviews, and your task is to perform unsupervised topic modelling.
“Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.”
A few examples for the kind of topics you can generate for the given dataset are: Value for money, ease of booking, garage service, etc.
For example-
If you have a phrase cluster like:
[great service, good garage service, professional service, friendly garage]
The above cluster is very relevant to the description of the topic: garage service and this phrase/word cluster can be used to label all reviews containing any of these phrases as garage service.
Given the input dataset, you need to generate relevant topics using any unsupervised ML technique.
Issue requirements / progress
You are expected to submit the code in either a jupyter notebook or colab notebook.
The solution will be judged on Data Analysis, Approach (NLP Techniques used), Code (Following good coding practices and mentioning), and any supporting documents submitted.
Resources
Reading up about NLP techniques, data cleaning and analysis, and various unsupervised techniques for topic modeling would be useful.
Directory Structure
Create a directory under the Intelligence folder and submit the jupyter notebook or colab notebook, along with any supporting documents (summary, explanation, motivation of approach).
Note
The text was updated successfully, but these errors were encountered: