Most widely used approaches use topic modelling i.e. LDA or NMF for finding keywords from a large corpus of documents. But, to get more meaningful, descriptive, precise and fine-grained keywords corresponding to a product description or review document, a little more specific approach is needed.
Dataset Used: https://www.kaggle.com/ak47bluestack/amazonphonedataset
Extracting Search Engine Appropriate Keywords and Key Selling Points from a Product's description in E-Commerce Websites
Problem Statement: Extract from mobile phone description dataset
1. Search Engine Appropriate Keywords
2. Key Selling Points
Search engine appropriate keywords can be:
a. Specific : eg. Samsung Galaxy, Flip cover, Redmi note 3, etc. i.e. proper nouns
b. General :Features such as: Smartphone, earphone, tangle free earphone (nouns other than proper)
For each document :
Extract Noun/ Proper nouns from preprocessed (stopwords+ lemmatization) text.
Join the extracted words to make a new document (pos_filtered_doc)
On list of pos_filtered_docs:
Apply ngram TFIDFVectorizer and for each document extract words with top-n TFIDF value.
Key selling points can be treated as general, but little more descriptive, quantified features eg. 1.3ghz processor, bezel less screen, 24 hr battery backup, octa core processor
Extract the noun phrases from the original document, corresponding to the keywords found in the previous step (i.e keyword extraction phase) .