Tweet Capture

These notebooks are meant to retrieve tweets related to arboviruses (dengue, zika, chikungunya) from Twitter API. The retrieval is done through streaming (in real time) and the retrieved tweets are stored in a MongoDB collection.

Main files

twitter_retriever.ipynb: it captures data from Twitter by streaming method and saves it into a MongoDB collection.
data_update_and_analysis.ipynb: updates geolocation data, which are not well organized on source data. It also compares databases if data is collected from different machines, as data retrieval from twitter might be asyncronous.
twitter_geolocation_eda.ipynb: It evaluates the geolocation variables from twitter data, for instance comparing the proportion of each variable from total tweets.

About MongoDB

MongoDB is a NoSQL database program which uses JSON-like document (also the data format from twitter API). You might navigate a database by using the browser MongoDB Compass or by using Python library pymongo.

Installation

Manual
For Windows, also install C Runtime

Tutorial

Tutorial
collection level operations
Tools for connecting. To connect to cluster: Go to your profile at cloud.mongodb.com >>> click clusters and select your cluster >>> click connect >>> connect your application >>> Find Connection String URI Format

Dumping and restoring MongoDB Collection:

mongodump --db DataBaseName --collection collection -o "path/to/folder"

Copy the generated dump/DataBaseName folder to the new machine. Then, import using mongorestore:

mongorestore --db DataBaseName /path/to/DataBaseName

the dump is the bson unzipped file

Note that /path/to/DataBaseName should be a directory filled with .json and .bson representations of your data.

You can also compress the file on the fly:

mongodump --db somedb --collection somecollection --out "path/to/folder" --gzip

About redundancy

It was suggested by Fiocruz to use some redundancy to capture data. One option might be AWS, which offers services such as Lambda (FaaS) and EC2 (IaaS).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Capture

Main files

About MongoDB

Installation

Tutorial

Dumping and restoring MongoDB Collection:

About redundancy

About

Releases

Packages

Contributors 2

Languages

AlertaDengue/tweet_capture

Folders and files

Latest commit

History

Repository files navigation

Tweet Capture

Main files

About MongoDB

Installation

Tutorial

Dumping and restoring MongoDB Collection:

About redundancy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages