diff --git a/CHANGELOG.md b/CHANGELOG.md index f8306cd2..95ef511d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -28,6 +28,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Matchup returns numSecondary and numPrimary counts rather than insitu/gridded - SDAP-402: Changed matchup matchOnce logic to match multiple points if same time/space - Bumped ingress timeout in Helm chart to reflect AWS gateway timeout +- SDAP-399: Updated quickstart guide for standalone docker deployment of SDAP. +- SDAP-399: Updated quickstart Jupyter notebook ### Deprecated ### Removed - removed dropdown from matchup doms endpoint secondary param diff --git a/docker/jupyter/Dockerfile b/docker/jupyter/Dockerfile index cbd207cc..db7d263b 100644 --- a/docker/jupyter/Dockerfile +++ b/docker/jupyter/Dockerfile @@ -37,9 +37,9 @@ RUN mkdir -p /home/jovyan/Quickstart && \ git init && \ git remote add -f origin ${APACHE_NEXUS} && \ git config core.sparseCheckout true && \ - echo "client" >> .git/info/sparse-checkout && \ + echo "integrations/python-client" >> .git/info/sparse-checkout && \ git pull origin ${APACHE_NEXUS_BRANCH} && \ - cd client && \ + cd integrations/python-client && \ python setup.py install -COPY ["Time Series Example.ipynb", "/home/jovyan/Quickstart/Time Series Example.ipynb"] +COPY ["Time Series Example.ipynb", "/home/jovyan/Quickstart/Time Series Example.ipynb"] \ No newline at end of file diff --git a/docker/jupyter/Time Series Example.ipynb b/docker/jupyter/Time Series Example.ipynb index 9fe5ee6f..9ff2c2fd 100644 --- a/docker/jupyter/Time Series Example.ipynb +++ b/docker/jupyter/Time Series Example.ipynb @@ -2,7 +2,11 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, "source": [ "# Start Here\n", "\n", @@ -12,7 +16,34 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ[\"PROJ_LIB\"] = \"/opt/conda/share/proj\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "pycharm": { + "name": "#%%\n" + } + }, "outputs": [], "source": [ "%matplotlib inline\n", @@ -89,7 +120,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, "source": [ "# Run Time Series and Plot\n", "\n", @@ -103,7 +138,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "pycharm": { + "name": "#%%\n" + } + }, "outputs": [], "source": [ "import time\n", @@ -139,7 +178,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -153,9 +192,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.5" + "version": "3.9.7" } }, "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/docker/jupyter/requirements.txt b/docker/jupyter/requirements.txt index e4f500a5..5749b680 100644 --- a/docker/jupyter/requirements.txt +++ b/docker/jupyter/requirements.txt @@ -1,4 +1,3 @@ -shapely -requests -numpy -cassandra-driver==3.9.0 +shapely==1.6.4.post2 +requests==2.21.0 +numpy>=1.13.3 diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 0b3be553..f9dc3cfc 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -1,7 +1,7 @@ .. _quickstart: ***************** -Quickstart Guide +Quickstart Guide - Docker ***************** This quickstart will take approximately 45 minutes to complete. @@ -18,43 +18,58 @@ This quickstart guide will walk you through how to install and run NEXUS on your Prerequisites ============== -* Docker (tested on v18.03.1-ce) +* Docker (tested on v20.10.17) * Internet Connection -* bash +* bash or zsh * cURL -* 500 MB of disk space +* 8.5 GB of disk space Prepare ======== -Start downloading the Docker images and data files. +Start downloading the Docker images and set up the Docker bridge network. .. _quickstart-step1: Pull Docker Images ------------------- -Pull the necessary Docker images from the `SDAP repository `_ on Docker Hub. Please check the repository for the latest version tag. +Pull the necessary Docker images from the `NEXUS JPL repository `_ on Docker Hub. Please check the repository for the latest version tag. .. code-block:: bash - export VERSION=1.0.0-rc1 + export CASSANDRA_VERSION=3.11.6-debian-10-r138 + export RMQ_VERSION=3.8.9-debian-10-r37 + export COLLECTION_MANAGER_VERSION=0.1.6a14 + export GRANULE_INGESTER_VERSION=0.1.6a30 + export WEBAPP_VERSION=distributed.0.4.5a54 + export SOLR_VERSION=8.11.1 + export SOLR_CLOUD_INIT_VERSION=1.0.2 + export ZK_VERSION=3.5.5 + + export JUPYTER_VERSION=1.0.0-rc2 .. code-block:: bash - docker pull sdap/ningester:${VERSION} - docker pull sdap/solr-singlenode:${VERSION} - docker pull sdap/cassandra:${VERSION} - docker pull sdap/nexus-webapp:standalone.${VERSION} + docker pull bitnami/cassandra:${CASSANDRA_VERSION} + docker pull bitnami/rabbitmq:${RMQ_VERSION} + docker pull nexusjpl/collection-manager:${COLLECTION_MANAGER_VERSION} + docker pull nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION} + docker pull nexusjpl/nexus-webapp:${WEBAPP_VERSION} + docker pull nexusjpl/solr:${SOLR_VERSION} + docker pull nexusjpl/solr-cloud-init:${SOLR_CLOUD_INIT_VERSION} + docker pull zookeeper:${ZK_VERSION} + + docker pull nexusjpl/jupyter:${JUPYTER_VERSION} .. _quickstart-step2: Create a new Docker Bridge Network ------------------------------------ -This quickstart constsists of launching several Docker containers that need to communicate with one another. To facilitate this communication, we want to be able to reference containers via hostname instead of IP address. The default bridge network used by Docker only supports this by using the ``--link`` option wich is now considered to be `deprecated `_. +This quickstart consists of launching several Docker containers that need to communicate with one another. To facilitate this communication, we want to be able to reference containers via hostname instead of IP address. The default bridge network used by Docker only supports this by using the ``--link`` option which is now considered to be `deprecated `_. -The currently recommended way to acheive what we want is to use a `user defined bridge network `_ and launch all of the containers into that network. +The currently recommended way to achieve what we want is to use a `user defined bridge network `_ and launch all of the containers into that network. The network we will be using for this quickstart will be called ``sdap-net``. Create it using the following command: @@ -64,181 +79,268 @@ The network we will be using for this quickstart will be called ``sdap-net``. Cr .. _quickstart-step3: -Download Sample Data ---------------------- +Start Core Components +====================== -The data we will be downloading is part of the `AVHRR OI dataset `_ which measures sea surface temperature. We will download 1 month of data and ingest it into a local Solr and Cassandra instance. +NEXUS relies on Apache Solr and Apache Cassandra to store tile metadata and science data, so let's start those first. -Choose a location that is mountable by Docker (typically needs to be under the User's home directory) to download the data files to. +Start Zookeeper +--------------- -.. code-block:: bash - - export DATA_DIRECTORY=~/nexus-quickstart/data/avhrr-granules - mkdir -p ${DATA_DIRECTORY} - -Then go ahead and download 1 month worth of AVHRR netCDF files. +In order to run Solr in cloud mode, we must first run Zookeeper. .. code-block:: bash - cd $DATA_DIRECTORY - - export URL_LIST="https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/305/20151101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/306/20151102120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/307/20151103120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/308/20151104120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/309/20151105120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/310/20151106120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/311/20151107120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/312/20151108120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/313/20151109120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/314/20151110120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/315/20151111120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/316/20151112120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/317/20151113120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/318/20151114120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/319/20151115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/320/20151116120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/321/20151117120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/322/20151118120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/323/20151119120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/324/20151120120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/325/20151121120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/326/20151122120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/327/20151123120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/328/20151124120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/329/20151125120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/330/20151126120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/331/20151127120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/332/20151128120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/333/20151129120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/334/20151130120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc" - - for url in ${URL_LIST}; do - curl -O "${url}" - done + docker run --name zookeeper -dp 2181:2181 zookeeper:${ZK_VERSION} -You should now have 30 files downloaded to your data directory, one for each day in November 2015. +We then need to ensure the ``/solr`` znode is present. -Start Data Storage Containers -============================== +.. code-block:: bash -We will use Solr and Cassandra to store the tile metadata and data respectively. + docker exec zookeeper bash -c "bin/zkCli.sh create /solr" .. _quickstart-step4: Start Solr ----------- -SDAP is tested with Solr version 7.x with the JTS topology suite add-on installed. The SDAP docker image is based off of the official Solr image and simply adds the JTS topology suite and the nexustiles core. +SDAP is tested with Solr version 8.11.1. -.. note:: Mounting a volume is optional but if you choose to do it, you can start and stop the Solr container without having to reingest your data every time. If you do not mount a volume, every time you stop your Solr container the data will be lost. +.. note:: Mounting a volume is optional but if you choose to do it, you can start and stop the Solr container without having to reingest your data every time. If you do not mount a volume, every time you stop your Solr container the data will be lost. If you don't want a volume, leave off the ``-v`` option in the following ``docker run`` command. To start Solr using a volume mount and expose the admin webapp on port 8983: .. code-block:: bash export SOLR_DATA=~/nexus-quickstart/solr - docker run --name solr --network sdap-net -v ${SOLR_DATA}:/opt/solr/server/solr/nexustiles/data -p 8983:8983 -d sdap/solr-singlenode:${VERSION} + mkdir -p ${SOLR_DATA} + docker run --name solr --network sdap-net -v ${SOLR_DATA}/:/opt/solr/server/solr/nexustiles/data -p 8983:8983 -e ZK_HOST="host.docker.internal:2181/solr" -d nexusjpl/solr:${SOLR_VERSION} + +This will start an instance of Solr. To initialize it, we need to run the ``solr-cloud-init`` image. -If you don't want to use a volume, leave off the ``-v`` option. +.. code-block:: bash + docker run -it --rm --name solr-init --network sdap-net -e SDAP_ZK_SOLR="host.docker.internal:2181/solr" -e SDAP_SOLR_URL="http://host.docker.internal:8983/solr/" -e CREATE_COLLECTION_PARAMS="name=nexustiles&numShards=1&waitForFinalState=true" nexusjpl/solr-cloud-init:${SOLR_CLOUD_INIT_VERSION} + +When the init script finishes, kill the container by typing ``Ctrl + C`` .. _quickstart-step5: -Start Cassandra ----------------- +Starting Cassandra +------------------- -SDAP is tested with Cassandra version 2.2.x. The SDAP docker image is based off of the official Cassandra image and simply mounts the schema DDL script into the container for easy initialization. +SDAP is tested with Cassandra version 3.11.6. -.. note:: Similar to the Solr container, using a volume is recommended but not required. +.. note:: Similar to the Solr container, using a volume is recommended but not required. Be aware that the second ``-v`` option is required. -To start cassandra using a volume mount and expose the connection port 9042: +Before starting Cassandra, we need to prepare a script to initialize the database. + +.. code-block:: bash + + export CASSANDRA_INIT=~/nexus-quickstart/init + mkdir -p ${CASSANDRA_INIT} + cat << EOF >> ${CASSANDRA_INIT}/initdb.cql + CREATE KEYSPACE IF NOT EXISTS nexustiles WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 }; + + CREATE TABLE IF NOT EXISTS nexustiles.sea_surface_temp ( + tile_id uuid PRIMARY KEY, + tile_blob blob + ); + EOF + +Now we can start the image and run the initialization script. .. code-block:: bash export CASSANDRA_DATA=~/nexus-quickstart/cassandra - docker run --name cassandra --network sdap-net -p 9042:9042 -v ${CASSANDRA_DATA}:/var/lib/cassandra -d sdap/cassandra:${VERSION} + mkdir -p ${CASSANDRA_DATA} + docker run --name cassandra --network sdap-net -p 9042:9042 -v ${CASSANDRA_DATA}/cassandra/:/var/lib/cassandra -v "${CASSANDRA_INIT}/initdb.cql:/scripts/initdb.cql" -d bitnami/cassandra:${CASSANDRA_VERSION} + +Wait a few moments for the database to start. + +.. code-block:: bash + + docker exec cassandra bash -c "cqlsh -u cassandra -p cassandra -f /scripts/initdb.cql" + +With Solr and Cassandra started and initialized, we can now start the collection manager and granule ingester(s). .. _quickstart-step6: -Ingest Data -============ +Start the Ingester +=================== -Now that Solr and Cassandra have both been started and configured, we can ingest some data. NEXUS ingests data using the ningester docker image. This image is designed to read configuration and data from volume mounts and then tile the data and save it to the datastores. More information can be found in the :ref:`ningester` section. +In this section, we will start the components for the ingester. These components are: -Ningester needs 3 things to run: +* one or more granule ingesters which process data granules into NEXUS tiles; +* the collection manager which watches for new granules and tells the ingesters about them and how they should be processed; and +* RabbitMQ which handles communication between the collection manager and ingesters. -#. Tiling configuration. How should the dataset be tiled? What is the dataset called? Are there any transformations that need to happen (e.g. kelvin to celsius conversion)? etc... -#. Connection configuration. What should be used for metadata storage and where can it be found? What should be used for data storage and where can it be found? -#. Data files. The data that will be ingested. +We will also be downloading a number of NetCDF files containing science data for use in this demo. -Tiling configuration ---------------------- +Create Data Directory +------------------------ + +Let's start by creating the directory to hold the science data to ingest. -For this quickstart we will use the AVHRR tiling configuration from the test job in the Apache project. It can be found here: `AvhrrJobTest.yml `_. Download that file into a temporary location on your laptop that can be mounted by Docker. +Choose a location that is mountable by Docker (typically needs to be under the user's home directory) to download the data files to. .. code-block:: bash - export NINGESTER_CONFIG=~/nexus-quickstart/ningester/config - mkdir -p ${NINGESTER_CONFIG} - cd ${NINGESTER_CONFIG} - curl -O https://raw.githubusercontent.com/apache/incubator-sdap-ningester/bc596c2749a7a2b44a01558b60428f6d008f4f45/src/testJobs/resources/testjobs/AvhrrJobTest.yml - -Connection configuration -------------------------- - -We want ningester to use Solr for its metadata store and Cassandra for its data store. We also want it to connect to the Solr and Cassandra instances we started earlier. In order to do this we need a connection configuration file that specifies how the application should connect to Solr and Cassandra. It looks like this: - -.. code-block:: yaml - - # Tile writer configuration - ningester: - tile_writer: - data_store: cassandraStore - metadata_store: solrStore - --- - # Connection settings for the docker profile - spring: - profiles: - - docker - data: - cassandra: - keyspaceName: nexustiles - contactPoints: cassandra - solr: - host: http://solr:8983/solr/ - - datasource: - solrStore: - collection: nexustiles - -Save this configuration to a file on your local laptop that can be mounted into a Docker container: + export DATA_DIRECTORY=~/nexus-quickstart/data/avhrr-granules + mkdir -p ${DATA_DIRECTORY} + +.. _quickstart-step7: + +Start RabbitMQ +---------------- + +The collection manager and granule ingester(s) use RabbitMQ to communicate, so we need to start that up first. .. code-block:: bash - touch ${NINGESTER_CONFIG}/connectionsettings.yml - cat << EOF >> ${NINGESTER_CONFIG}/connectionsettings.yml - # Tile writer configuration - ningester: - tile_writer: - data_store: cassandraStore - metadata_store: solrStore - --- - # Connection settings for the docker profile - spring: - profiles: - - docker - data: - cassandra: - keyspaceName: nexustiles - contactPoints: cassandra - solr: - host: http://solr:8983/solr/ - datasource: - solrStore: - collection: nexustiles + docker run -dp 5672:5672 -p 15672:15672 --name rmq --network sdap-net bitnami/rabbitmq:${RMQ_VERSION} + +.. _quickstart-step8: + +Start the Granule Ingester(s) +----------------------------- + +The granule ingester(s) read new granules from the message queue and process them into tiles. For the set of granules we will be using in this guide, we recommend using two ingester containers to speed up the process. + +.. code-block:: bash + + cat << EOF >> granule-ingester.env + RABBITMQ_HOST=host.docker.internal:5672 + RABBITMQ_USERNAME=user + RABBITMQ_PASSWORD=bitnami + CASSANDRA_CONTACT_POINTS=host.docker.internal + CASSANDRA_USERNAME=cassandra + CASSANDRA_PASSWORD=cassandra + SOLR_HOST_AND_PORT=http://host.docker.internal:8983 EOF + docker run --name granule-ingester-1 --network sdap-net -d --env-file granule-ingester.env \ + -v ${DATA_DIRECTORY}:/data/granules/ nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION} -Data files ------------ + docker run --name granule-ingester-2 --network sdap-net -d --env-file granule-ingester.env \ + -v ${DATA_DIRECTORY}:/data/granules/ nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION} -We already downloaded the datafiles to ``${DATA_DIRECTORY}`` in :ref:`quickstart-step2` so we are ready to start ingesting. +.. _quickstart-optional-step: -Launch Ningester -------------------- +[OPTIONAL] Run Message Queue Monitor +------------------------------------- + +The granule ingestion process can take some time. To monitor its progress, we wrote a simple python script to monitor the message queue. It will wait until some granules show up and then will exit once they have all been ingested. -The ningester docker image runs a batch job that will ingest one granule. Here, we do a quick for loop to cycle through each data file and run ingestion on it. +The script only needs the requests module, which can be installed by running ``pip install requests`` if you do not have it. -.. note:: Ingestion takes about 60 seconds per file. Depending on how powerful your laptop is and what other programs you have running, you can choose to ingest more than one file at a time. If you use this example, we will be ingesting 1 file at a time. So, for 30 files this will take roughly 30 minutes. You can speed this up by reducing the time spent sleeping by changing ``sleep 60`` to something like ``sleep 30``. +To download the script: + +.. code-block:: bash + + curl -O https://raw.githubusercontent.com/apache/incubator-sdap-nexus/master/tools/rmqmonitor/monitor.py + +And then run it in a separate shell + +.. code-block:: bash + + python monitor.py + +.. _quickstart-step9: + +Download Sample Data +--------------------- + +The data we will be downloading is part of the `AVHRR OI dataset `_ which measures sea surface temperature. We will download 1 month of data and ingest it into a local Solr and Cassandra instance. + +Then go ahead and download 1 month worth of AVHRR netCDF files. .. code-block:: bash - for g in `ls ${DATA_DIRECTORY} | awk "{print $1}"` - do - docker run -d --name $(echo avhrr_$g | cut -d'-' -f 1) --network sdap-net -v ${NINGESTER_CONFIG}:/home/ningester/config/ -v ${DATA_DIRECTORY}/${g}:/home/ningester/data/${g} sdap/ningester:${VERSION} docker,solr,cassandra - sleep 60 + cd $DATA_DIRECTORY + + export URL_LIST="https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/305/20151101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/306/20151102120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/307/20151103120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/308/20151104120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/309/20151105120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/310/20151106120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/311/20151107120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/312/20151108120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/313/20151109120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/314/20151110120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/315/20151111120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/316/20151112120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/317/20151113120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/318/20151114120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/319/20151115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/320/20151116120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/321/20151117120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/322/20151118120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/323/20151119120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/324/20151120120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/325/20151121120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/326/20151122120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/327/20151123120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/328/20151124120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/329/20151125120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/330/20151126120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/331/20151127120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/332/20151128120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/333/20151129120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/334/20151130120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc" + + for url in ${URL_LIST}; do + curl -O "${url}" done -Each container will be launched with a name of ``avhrr_`` where ```` is the date from the filename of the granule being ingested. You can use ``docker ps`` to watch the containers launch and you can use ``docker logs `` to view the logs for any one container as the data is ingested. +.. note:: + + The dataset is pending migration from PO.DAAC to the Earthdata Cloud in AWS. Following this migration, the above links may not work (though an updated download method will soon follow) -You can move on to the next section while the data ingests. + For reference: There are 30 granules for every day in November of 2015. -.. note:: After the container finishes ingesting the file, the container will exit (with a ``0`` exit code) indicating completion. However, the containers will **not** automatically be removed for you. This is simply to allow you to inspect the containers even after they have exited if you want to. A useful command to clean up all of the stopped containers that we started is ``docker rm $(docker ps -a | grep avhrr | awk '{print $1}')``. +You should now have 30 files downloaded to your data directory, one for each day in November 2015. +.. _quickstart-step10: -.. _quickstart-step7: +Create Collection Configuration +-------------------------------- + +The collection configuration is a ``.yml`` file that tells the collection manager what datasets it is managing, where the granules are stored, and how they are to be tiled. + +.. code-block:: bash + + export CONFIG_DIR=~/nexus-quickstart/ingester/config + mkdir -p ${CONFIG_DIR} + cat << EOF >> ${CONFIG_DIR}/collectionConfig.yml + collections: + - id: AVHRR_OI_L4_GHRSST_NCEI + path: /data/granules/*.nc + priority: 1 + forward-processing-priority: 5 + projection: Grid + dimensionNames: + latitude: lat + longitude: lon + time: time + variable: analysed_sst + slices: + lat: 100 + lon: 100 + time: 1 + EOF + +.. note:: + + The values under ``slices`` determine the tile sizes. We used the configuration above for faster ingestion time, but be aware there is a tradeoff between ingestion time and analysis time. Larger tile sizes yield faster ingestion times but slower analysis times and vice versa. + + Feel free to edit the tile size in the configuration we just created, but keep the aforementioned tradeoff in mind. + +.. _quickstart-step11: + +Start the Collection Manager +----------------------------- + +Now we can start the collection manager. + +.. code-block:: bash + + docker run --name collection-manager --network sdap-net -v ${DATA_DIRECTORY}:/data/granules/ -v ${CONFIG_DIR}:/home/ingester/config/ -e COLLECTIONS_PATH="/home/ingester/config/collectionConfig.yml" -e HISTORY_URL="http://host.docker.internal:8983/" -e RABBITMQ_HOST="host.docker.internal:5672" -e RABBITMQ_USERNAME="user" -e RABBITMQ_PASSWORD="bitnami" -d nexusjpl/collection-manager:${COLLECTION_MANAGER_VERSION} + +.. _quickstart-step12: + +When it starts, it will publish messages for the downloaded granules to RabbitMQ and the ingesters will automatically begin processing the data (it may take a few moments for this to kick in). You can monitor the progress of the ingestion in several ways: + +* You can use the above mentioned script. Ingestion is completed when the script exits. +* You can tail the ingester containers' logs with a command like ``docker logs -f `` and wait for activity to cease. +* You can monitor the message queue at http://localhost:15672/#/queues/%2F/nexus. Use username ``user`` and password ``bitnami``. Ingestion is completed when the 'Ready', 'Unacked', and 'Total' message counts are all zero. + +.. note:: + + There are known issues that can occur during the ingestion process, you can find more information on them in the 'Known Issues' section at the end of this document. + +.. note:: + + It is recommended you do not download new granules to the data directory, as doing so can result in duplicate messages being published due to the collection manager flagging the partially and completely downloaded granule as new granules. + + To work around this: + + #. Download granules to a separate directory and move them to the data directory. + #. Use a temporary filename then rename. ``curl -o .tmp && mv .tmp `` + +.. _quickstart-step13: Start the Webapp ================= @@ -247,9 +349,9 @@ Now that the data is being (has been) ingested, we need to start the webapp that .. code-block:: bash - docker run -d --name nexus-webapp --network sdap-net -p 8083:8083 -e SPARK_LOCAL_IP=127.0.0.1 -e MASTER=local[4] -e CASSANDRA_CONTACT_POINTS=cassandra -e SOLR_URL_PORT=solr:8983 sdap/nexus-webapp:standalone.${VERSION} + docker run -d --name nexus-webapp --network sdap-net -p 8083:8083 nexusjpl/nexus-webapp:${WEBAPP_VERSION} python3 /incubator-sdap-nexus/analysis/webservice/webapp.py --solr_host="http://host.docker.internal:8983" --cassandra_host=host.docker.internal --cassandra_username=cassandra --cassandra_password=cassandra -.. note:: If you see a messasge like ``docker: invalid reference format`` it likely means you need to re-export the ``VERSION`` environment variable again. This can happen when you open a new terminal window or tab. +.. note:: If you see a message like ``docker: invalid reference format`` it likely means you need to re-export the ``WEBAPP_VERSION`` environment variable again. This can happen when you open a new terminal window or tab. This command starts the nexus webservice and connects it to the Solr and Cassandra containers. It also sets the configuration for Spark to use local mode with 4 executors. @@ -259,11 +361,14 @@ After running this command you should be able to access the NEXUS webservice by curl -X GET http://localhost:8083/list +.. note:: -.. _quickstart-step8: + You may need to wait a few moments before the webservice is available. -Launch Jupyter -================ +.. _quickstart-step14: + +Launch Jupyter And Run The Demo Notebook +======================================== At this point NEXUS is running and you can interact with the different API endpoints. However, there is a python client library called ``nexuscli`` which facilitates interacting with the webservice through the Python programming language. The easiest way to use this library is to start the `Jupyter notebook `_ docker image from the SDAP repository. This image is based off of the ``jupyter/scipy-notebook`` docker image but comes pre-installed with the ``nexuscli`` module and an example notebook. @@ -271,7 +376,7 @@ To launch the Jupyter notebook use the following command: .. code-block:: bash - docker run -it --rm --name jupyter --network sdap-net -p 8888:8888 nexusjpl/jupyter:${VERSION} start-notebook.sh --NotebookApp.password='sha1:a0d7f85e5fc4:0c173bb35c7dc0445b13865a38d25263db592938' + docker run -it --rm --name jupyter --network sdap-net -p 8888:8888 nexusjpl/jupyter:${JUPYTER_VERSION} start-notebook.sh --NotebookApp.password='sha1:a0d7f85e5fc4:0c173bb35c7dc0445b13865a38d25263db592938' This command launches a Juypter container and exposes it on port 8888. @@ -283,19 +388,11 @@ Once the container starts, navigate to http://localhost:8888/. You will be promp Click on the ``Quickstart`` directory to open it. You should see a notebook called ``Time Series Example``: -Add a cell at the top of the notebook: - -.. code-block:: Python - - import os - os.environ["PROJ_LIB"] = "/opt/conda/share/proj" - - .. image:: images/Jupyter_Quickstart.png Click on the ``Time Series Example`` notebook to start it. This will open the notebook and allow you to run the two cells and execute a Time Series command against your local instance of NEXUS. -.. _quickstart-step9: +.. _quickstart-finished: Finished! ================ @@ -306,3 +403,51 @@ Congratulations you have completed the quickstart! In this example you: #. Learned how to start the NEXUS webservice #. Learned how to start a Jupyter Notebook #. Ran a time series analysis on 1 month of AVHRR OI data and plotted the result + +Cleanup +======== + +To shut down the Solr container cleanly, run the following command: + +.. code-block:: bash + + docker exec solr /opt/bitnami/solr/bin/solr stop -p 8983 + +The remaining containers can safely be stopped using Docker Desktop or by running + +.. code-block:: bash + + docker stop + +.. _issues: + +Known Issues +============= + +This section contains a list of issues that may be encountered while running this guide, their causes and solutions. + +Granule Ingester Containers Crash +--------------------------------- + +While ingesting data, the granule ingester containers may crash. You can tell this has happened if: + +* The status of one or more of the ingester containers is not 'running' +* The monitor script output shows a number of in progress tasks less than the number of ingesters and a nonzero number of waiting tasks +* The browser interface shows a number of 'unacked' messages less than the number of ingesters and a nonzero number of 'ready' messages + +The cause of these crashes seems to be a loss of connection to the Solr container. + +There are two solutions to this issue: + +* Restart the container(s) with the command: ``docker restart `` or through Docker Desktop +* Try running only one ingester container. + +Collection Manager Messages Not Publishing +------------------------------------------- + +RabbitMQ may not receive the messages published by the Collection Manager. When this happens, new granules added to monitored collections will not be processed by the ingester(s). + +The cause of this issue seems to be due to the RMQ container having limited resources, which causes message publication to block indefinitely. + +To solve this, first figure out which resource is causing issues by navigating to http://localhost:15672/#/ and sign in with username ``user`` and password ``bitnami``. View the 'Nodes' section. Insufficient resources will be shown in red. Allocate more of those resources in Docker and restart the Docker daemon. + diff --git a/integrations/python-client/nexuscli/__init__.py b/integrations/python-client/nexuscli/__init__.py index 8db88e5e..d9162feb 100644 --- a/integrations/python-client/nexuscli/__init__.py +++ b/integrations/python-client/nexuscli/__init__.py @@ -13,12 +13,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -from .nexuscli.nexuscli import TimeSeries -from .nexuscli.nexuscli import set_target -from .nexuscli.nexuscli import time_series -from .nexuscli.nexuscli import dataset_list -from .nexuscli.nexuscli import daily_difference_average -from .nexuscli.nexuscli import subset -from .nexuscli.nexuscli_ow import set_target -from .nexuscli.nexuscli_ow import run_file -from .nexuscli.nexuscli_ow import run_str +from nexuscli.nexuscli import TimeSeries +from nexuscli.nexuscli import set_target +from nexuscli.nexuscli import time_series +from nexuscli.nexuscli import dataset_list +from nexuscli.nexuscli import daily_difference_average +from nexuscli.nexuscli import subset +from nexuscli.nexuscli_ow import set_target +from nexuscli.nexuscli_ow import run_file +from nexuscli.nexuscli_ow import run_str