Skip to content

bansi36/NLP-fb_comments-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

facebook-comments-scraper & Natural Language Processing

  • facebook comments scraping using bs4 or Beautifulsoup
  • the very first step is to specify your credentials and profile urls to the main scraper code which is faceboksdata_scraper
  • run facebooksdata_scraper code
  • profile_data.json which is the output of the first code, convert it into the csv file using the json_to_csv_conv.py and get profile_data.csv file.

natural language processing

  • Then in text processing there are tokenization, part of speech tagging, stop word removal, stemming and lemmatization steps performed.

-> Feature Extraction

  • The raw data or a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length.

  • scikit-learn provides utilities for the most common ways to extract numerical features from text content, namely:

    • tokenizing strings and giving an integer id for each possible token, for instance by using white-spaces and punctuation as token separators.

    • counting the occurrences of tokens in each document.

    • normalizing and weighting with diminishing importance tokens that occur in the majority of samples / documents.

  • each individual token occurrence frequency (normalized or not) is treated as a feature.

  • vectorization is the general process of turning a collection of text documents into numerical feature vectors.

  • In order to re-weight the count features into floating point values suitable for usage by a classifier use the tf–idf vectorizer.

  • Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency: tf-idf(t,d) = tf(t,d) x idf(t)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published