facebook-comments-scraper & Natural Language Processing

facebook comments scraping using bs4 or Beautifulsoup
the very first step is to specify your credentials and profile urls to the main scraper code which is faceboksdata_scraper
run facebooksdata_scraper code
profile_data.json which is the output of the first code, convert it into the csv file using the json_to_csv_conv.py and get profile_data.csv file.

natural language processing

Then in text processing there are tokenization, part of speech tagging, stop word removal, stemming and lemmatization steps performed.

-> Feature Extraction

The raw data or a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length.
scikit-learn provides utilities for the most common ways to extract numerical features from text content, namely:
- tokenizing strings and giving an integer id for each possible token, for instance by using white-spaces and punctuation as token separators.
- counting the occurrences of tokens in each document.
- normalizing and weighting with diminishing importance tokens that occur in the majority of samples / documents.
each individual token occurrence frequency (normalized or not) is treated as a feature.
vectorization is the general process of turning a collection of text documents into numerical feature vectors.
In order to re-weight the count features into floating point values suitable for usage by a classifier use the tf–idf vectorizer.
Tf means term-frequency while tf–idf means term-frequency times inverse document-frequency: tf-idf(t,d) = tf(t,d) x idf(t)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Assignment-1-text_processing.ipynb		Assignment-1-text_processing.ipynb
Assignment-2-Feature_Extraction.ipynb		Assignment-2-Feature_Extraction.ipynb
Assignment-4LatentSemanticAnalysis.ipynb		Assignment-4LatentSemanticAnalysis.ipynb
Assignment-5-Text_Classification.ipynb		Assignment-5-Text_Classification.ipynb
Assignment-6-glove.ipynb		Assignment-6-glove.ipynb
Assignment-7-Word2Vec.ipynb		Assignment-7-Word2Vec.ipynb
README.md		README.md
SMSSpamCollection		SMSSpamCollection
credentials.json		credentials.json
facebooksdata_scraper.py		facebooksdata_scraper.py
json_to_csv_conv.py		json_to_csv_conv.py
profile_data.csv		profile_data.csv
profile_data.json		profile_data.json
profiles_urls.json		profiles_urls.json