Skip to content

Backend JSON API - Anomaly Detection and Recommender System for Zenodo. Scala, Docker, Java Spark, SBT

Notifications You must be signed in to change notification settings

alastairparagas/ZenodoAddon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zenodo

ZenodoAddon (Backend)

A Keyword Recommendation and Anomaly Detection Service for Zenodo, an open-source document repository funded and powered by CERN. Presentation about it is at https://cds.cern.ch/record/2280009. Presentation slides at https://drive.google.com/file/d/0BzW6zUEv_wl9Nmh2cTUyRmYxYlE/view?usp=sharing.

Tech Stack

  • 🐍 Scala 2.12 - JVM language with practically absolute interopability with Java
  • 🗒️ Postgres - Full-text search capability for finding origin vertices
  • 🐛 Redis - Caching computation results and serving them on-demand
  • 🐋 Docker - Containerization and easy dev/prod setup
  • 🎓 Stanford CoreNLP - POS/NER sample for Graph Transformer

Getting Started

  • Download SBT at https://scala-lang.org/download/
  • Download Docker and Docker-Compose (Community Edition)
  • git clone this project
  • cd into the project's root
  • Spin up the Docker containers with docker-compose up -d
    • Note: The Postgres container must be prefilled with tsvectors (the keyword_raw table must be filled with prepopulated data). The Zenodo-Filescript script should be run with the parameters --fulltext_db_username, --fulltext_db_password, --fulltext_db_name, --fulltext_db_host, --fulltext_db_port set to the same database configuration so that it fills up this specific Postgres Docker container (the script will automatically prepopulate data based on data it picks up from the source db, in this case, Zenodo's DB servers)
  • While still being in the same project base directory, copy .env_sample to a new .env file and make sure to fill in all environment variables as specified in the file
  • While still being in the same project base directory, run env $(tr "\\n" " " < .env) sbt run
    • This turns the .env file into actual environment variables that our program can see
  • sbt run
    • This command compiles the program and runs it.
    • With the port number specified as an environment variable PORT through .env file, you can now make POST requests to localhost:[SPECIFIED_PORT_NUMBER]/recommendation

You can now make requests to the microservice!

DB Schema

The Postgres database (that is available through the provided Docker container) sports the following data schema (automatically created) - database tables followed by the database column types for the full-text search capability:

  • keyword_raw - id:uuid, keyword:text, keyword_vector:tsvector

Also, 2 functions are automatically created on DB bootstrap - create_keyword(keyword_text) and search_keyword_matches(keyword_text). The former stores the provided keyword_text into the keyword_raw table. The latter searches the current keyword for possible close matches in the database.

API Endpoint

  • POST /recommendation/
    • Request: json - must be provided as the body of the request
       {
        "keyword": ["ghost", "electricity"], // Keyword tags a user gave
        "ranker": "distance", // Could be distance, ppr, pprMean
        "vertexFinder": "fulltext", // Could be fulltext, plain
        "count": 20,
        "addons": ["cache"] // Only available addon is cache for the moment. (optional)
       }
  • Response: json - emitted by the microservice
        {
        "success": true, // if >= 400, false
        "message": "result message",
        "data": {
            "addons": {}, // Dict mapping of addons to execution metadata
            "results": [] // Actual list of keyword recommendations
        }
        }

About

Backend JSON API - Anomaly Detection and Recommender System for Zenodo. Scala, Docker, Java Spark, SBT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published