Anomaly Detection

The anomaly-detection library is part of the of the TOS2CA Project. For more information, visit the TOS2CA website at https://nasa-jpl.github.io/tos2ca-documentation/.

This python library is responsible for:

Taking user input about an inequality, variables, temporal bounds, and geospatial bounds
Retrieving subsetted data matching that user input
Converting that data to a binary format in a time-ordered sequence
Passing the data to ForTraCC
Using the masks produced by ForTraCC to retrieve curated data of interest to the user

Requirements

Access to the TOS2CA data dictionaries
ForTraCC
See the requirements.txt file for required Python libraries

Resources

Access to an S3 bucket where you can read and write data
Access to a MySQL database that stores user input
Access to a Redis/Elasticache memory store to temporarily house data that's being read/curated
Access to AWS Secrets Manager to retrieve things like credentails, tokes, etc.
Should have a NASA Earthdata login to use any tools DAAC tools/applications
Should have access to the us-west-2 AWS region to access any NASA DAAC data over S3
The code is currently run on an EC2 instance but future work includes containerizing the code and running it in chunks on AWS Fargate

Library Flow

Phenomenon Definition (PhDef) Stage

Running the library in an end-to-end fashion requires the following steps:

Read the job information from the database that includes all the infomration about the temporal, spatial, operator, dataset, and variable requested. Initially, jobs will be in 'pending' status in the database.
Once you have the job information, choose the appropriate reader for the dataset/variable in question and mark the job as 'running' in the database.
Request and/or read the data, returning a subset based on the user input.
Format the data into a dict type and convert it to binary.
Store the read data in Elasticache.
Call the ForTraCC operator class that will start the ForTraCC job, reading the data from Elasticache and converting it back from binary to dict.
Deposit the ForTraCC output into th S3 bucket.
Create plots and GeoJSON polygons of the anomalies.
Upload the plots and GeoJSON to S3.
E-Mail the user that their job is complete and send them the locatin of the S3 bucket with their job directory.
Mark the job as 'complete' in the database. The user can continue on to data curation or exit the system here.

Data Curation Stage

Running the library in an end-to-end fashion requires the following steps:

Read the job information from the database that include information about what PhDef to run against, along with the dataset and variable information.
Once you have the job information, choose the appropriate curator for the dataset/variable in question and mark the job as 'running' in the database. Initially, jobs will be in 'pending' status in the database.
Run the curator, which will output a netCDF-4 file with the data for each anomaly at each time step. Note that incronguities may exist beween the grids and timesteps between the data used in PhDef and the requested curator data. See the metadata in the output curated data file for additional information on this.
Upload the curated data file and JSON hierarchy file to S3.
Run the curated file through the interpolater. This will get the curated data on the same temporal and spatial resolution as the original mask data. This way the user can compare them more easily. This will also generate statistics in the metadata of the interpolated file. The interpolated fill will also be stored in S3.
E-Mail the user that their job is complete and send them the locatin of the S3 bucket with their job directory.
Mark the job as 'complete' in the database.
User can make plots of individual anomalies at spcific timestamps using the interpolated file. The user can continue on to visualization tools, download the data, or exit the system here.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection

Requirements

Resources

Library Flow

Phenomenon Definition (PhDef) Stage

Data Curation Stage

About

Releases 1

Packages

Languages

License

nasa-jpl/tos2ca-anomaly-detection

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection

Requirements

Resources

Library Flow

Phenomenon Definition (PhDef) Stage

Data Curation Stage

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages