Skip to content
dkakkar edited this page Mar 23, 2020 · 23 revisions

Copying GDELT to FASRC

  • Copy all Gdelt files which need to be analyzed to a folder on FASRC. Put unzipped CSV in the folder. For initially testing start with 1 month of unzipped CSV and 64GB of RAM.

Creating the Conda environment

  • Login(ssh) to the node where Omnisci is running and give the following command on command line:
module load Anaconda3/5.0.1-fasrc02
conda create -n gdelt python=3.6
source activate gdelt
pip install pymapd
pip install pandas
pip install pyarrow

Running the script:

  • Copy the script from '/n/holyscratch01/cga/dkakkar/scripts/gdelt.py' to your home directory

  • Edit the script to give your folder path (containing gdelt files) and your Omnisci port information in the connection string

  • We will need the backend port information here. To find the backend port number follow the steps below:

    • Click on session ID of your Omnisci session
    • Open output.log file
    • Look for "Backend TCP" in the file and copy the port number from there
  • After activating conda environment run the script using:

python3 gdelt.py