Terra Scientific Pipelines Service

Overview

Terra Scientific Pipelines Service, or teaspoons, facilitates running a number of defined scientific pipelines on behalf of users that users can't run themselves in Terra. The most common reason for this is that the pipeline accesses proprietary data that users are not allowed to access directly, but that may be used as e.g. a reference panel for imputation.

Supported pipelines

Current supported pipelines are:

[in development] Imputation (TODO add link/info)

Architecture

WIP architecture doc Linked LucidChart

Development

This codebase is in initial development.

Requirements

This service is written in Java 17, and uses Postgres 13.

To run locally, you'll also need:

jq - install with brew install jq
vault - see DSP's setup instructions here
- Note that for Step 7, "Create a GitHub Personal Access Token", you'll want to choose the "Tokens (classic)" option, not the fine-grained access token option.
Java 17 - can be installed manually or through IntelliJ which will do it for you when importing the project
Postgres 13 - multiple solutions here as long as you have a postgres instance running on localhost:5432 the local app will connect appropriately
- Postgres.app https://postgresapp.com/
- Brew https://formulae.brew.sh/formula/postgresql@13

Tech stack

Java 17 temurin
Postgres 13.1
Gradle - build automation tool
SonarQube - static code security and coverage
Trivy - security scanner for docker images
Jib - docker image builder for Java

Local development

To run locally:

Make sure you have the requirements installed from above. We recommend IntelliJ as an IDE.
Clone the repo (if you see broken inputs build the project to get the generated sources)
Run the commands in scripts/postgres-init.sql in your local postgres instance. You will need to be authenticated to access Vault.
Run scripts/write-config.sh
Run ./gradlew bootRun to spin up the server.
Navigate to http://localhost:8080/#
If this is your first time deploying to any environment, be sure to use the admin endpoint /api/admin/v1/updatePipelineWorkspaceId/{pipelineName}/{workspaceId} to set your pipeline's workspace id. Workspace id can be found through the terra ui workspace dashboard or through the Rawls GET workspace endpoint.

Local development with debugging

If using Intellij (only IDE we use on the team), you can run the server with a debugger. Follow the steps above but instead of running ./gradlew bootRun to spin up the server, you can run (debug) the App.java class through intellij and set breakpoints in the code. Be sure to set the GOOGLE_APPLICATION_CREDENTIALS=config/tsps-sa.json in the Run/Debug configuration Environment Variables.

Running Tests/Linter Locally

Testing
- Run ./gradlew service:test to run tests
Linting
- Run ./gradlew spotlessCheck to run linter checks
- Run ./gradlew :service:spotlessApply to apply fix any issues the linter finds

(Optional) Install pre-commit hooks

[scripts/git-hooks/pre-commit] has been provided to help ensure all submitted changes are formatted correctly. To install all hooks in [scripts/git-hooks], run:

git config core.hooksPath scripts/git-hooks

Running SonarQube locally

SonarQube is a static analysis code that scans code for a wide range of issues, including maintainability and possible bugs. Get more information from DSP SonarQube Docs

If you get a build failure due to SonarQube and want to debug the problem locally, you need to get the sonar token from vault before running the gradle task.

export SONAR_TOKEN=$(vault read -field=sonar_token secret/secops/ci/sonarcloud/tsps)
./gradlew sonarqube

Running this task produces no output unless your project has errors. To generate a report, run using --info:

./gradlew sonarqube --info

Connecting to the database

To connect to the Teaspoons database, we have a script in dsp-scripts that does all the setup for you. Clone that repo and make sure you're either on Broad Internal wifi or connected to the VPN. Then run the following command:

./db/psql-connect.sh dev tsps

Deploying to dev

Upon merging to main, the dev environment will be automatically deployed via the GitHub Action Bump, Tag, Publish, and Deploy (that workflow is defined here).

The two tasks report-to-sherlock and set-version-in-dev will prompt Sherlock to deploy the new version to dev. You can check the status of the deployment in Beehive and in ArgoCD.

For more information about deployment to dev, check out DevOps' excellent documentation.

Tracing

We use OpenTelemetry for tracing, so that every request has a tracing span that can be viewed in Google Cloud Trace. (This is not yet fully set up here - to be done in TSPS-107). See this DSP blog post for more info.

Running the end-to-end tests

The end-to-end test is specified in .github/workflows/run-e2e-tests.yaml. It calls the test script defined in the dsp-reusable-workflows repo.

The end-to-end test is automatically run nightly on the dev environment.

To run the test against a specific feature branch:

Grab the image tag for your feature branch.

If you've opened a PR, you can find the image tag as follows:

go to the Bump, Tag, Publish, and Deploy workflow that's triggered each time you push to your branch

From there, go to the tag-publish-docker-deploy task

Expand the "Construct docker image name and tag" step

The first line should contain the image tag, something like "0.0.81-6761487".

Navigate to the e2e-test GHA workflow
Click on the "Run workflow" button and select your branch from the dropdown

Enter the image tag from step 1 in the "Custom image tag" field
If you've updated the end-to-end test in the dsp-resuable-workflows repo, enter either a commit hash or your git branch name. If you don't need to change the test, leave the default as main.

Click the green "Run workflow" button.

Python clients

We publish a "thin", auto-generated Python client that wraps the Teaspoons APIs. This client is published to PyPi and can be installed with pip install teaspoons_client, although this is not meant to be user-facing. The thin api client is generated from the OpenAPI spec in the openapi directory.

Publishing occurs automatically when a new version of the service is deployed, via the release-python-client GHA.

We also have a user-facing, "thick" CLI whose code lives in a separate repository: DataBiosphere/terra-scientific-pipelines-service-cli.

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github		.github
buildSrc		buildSrc
client		client
common		common
gradle/wrapper		gradle/wrapper
pipelines/imputation		pipelines/imputation
rawls-client		rawls-client
scripts		scripts
service		service
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
foundation.yaml		foundation.yaml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
pull_request_template.md		pull_request_template.md
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terra Scientific Pipelines Service

Overview

Supported pipelines

Architecture

Development

Requirements

Tech stack

Local development

Local development with debugging

Running Tests/Linter Locally

(Optional) Install pre-commit hooks

Running SonarQube locally

Connecting to the database

Deploying to dev

Tracing

Running the end-to-end tests

Python clients

About

Releases 132

Packages

Contributors 6

Languages

License

DataBiosphere/terra-scientific-pipelines-service

Folders and files

Latest commit

History

Repository files navigation

Terra Scientific Pipelines Service

Overview

Supported pipelines

Architecture

Development

Requirements

Tech stack

Local development

Local development with debugging

Running Tests/Linter Locally

(Optional) Install pre-commit hooks

Running SonarQube locally

Connecting to the database

Deploying to dev

Tracing

Running the end-to-end tests

Python clients

About

Resources

License

Stars

Watchers

Forks

Releases 132

Packages 0

Contributors 6

Languages

Packages