Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

Latest commit

 

History

History
370 lines (344 loc) · 10.4 KB

02.find_application.md

File metadata and controls

370 lines (344 loc) · 10.4 KB
layout title tagline
page
Find an Application

The SD2E catalog has several pre-built public applications (or "apps" for short) that are available to run right now. Each public application is tied to a public HPC system.


#### List applications

The apps-list command can be used to list all applications you have access to:

% apps-list

fastqc-0.11.5u2
fcs-etl-0.3.3u3
fcs-tasbe-0.2.0u6
hello-agave-cli-0.1.0u1
hello-agave-hpc-0.1.0u1
hello-container-0.1.0u1
kallisto-0.43.1u3
lcms-pyquant-0.1.0u1
lcms-pyquant-0.1.1u1
lcms-wrangler-0.1.0u2
lcms-wrangler-0.1.0u3
msf-0.1.0u3
msf-wrangler-0.1.0u3
rnaseq-0.1.1u1
rnaseq-0.1.2u1
rnaseq-broad-0.1.1u1
sailfish-0.10.1u4
sortmerna-0.0.1u1
ta3-api-0.0.2u1

This can be modified with -P or -Q flags to show only public or private apps, respectively. Apps with a suffix of u1, u2, u3...etc are updates or revisions of previous apps.


#### Search for applications by name

Use the apps-search command to see if an app you are interested in is available in the SD2E catalog:

% apps-search -h    # show syntax for searching
% apps-search public=true name.like=*kallisto*
kallisto-0.43.1u3

In this case, the apps-search command finds one match and returns the newest revision of the kallisto app.


#### Determine the application function

Many applications you will find in the catalog can be used in different ways. For example, depending on how kallisto is called on the command line, it can be used to build an index, quantify abundances of transcripts from RNA-Seq data, run a pseudoalignment, or convert HDF5-formatted results to plain text. It is up to the developer of the SD2E app to decide which function(s) this particular instance of the app will perform, and to appropriately document it in the app description.

To see the app description, perform:

% apps-list -v kallisto-0.43.1u3

The output of this command is a json description of the app including metadata, inputs, parameters, and outputs.


#### The `metadata` section

The metadata section of the app json file includes information about the app availability, runtime resources required, description, and much more:

  "available": true,
  "checkpointable": false,
  "defaultMaxRunTime": "00:30:00",
  "defaultMemoryPerNode": 256,
  "defaultNodeCount": 1,
  "defaultProcessorsPerNode": 20,
  "defaultQueue": "normal",
  "deploymentPath": "/.public-apps/kallisto-0.43.1u3.zip",
  "deploymentSystem": "data-sd2e-projects-users",
  "executionSystem": "hpc-tacc-wrangler",
  "executionType": "HPC",
  "helpURI": "https://pachterlab.github.io/kallisto/",
  "icon": null,
  "id": "kallisto-0.43.1u3",
  "isPublic": true,
  "label": "Kallisto",
  "lastModified": "2017-10-10T14:12:09.000-05:00",
  "longDescription": "A program for quantifying abundances of transcripts from RNA-Seq data",
  "modules": [
      "load tacc-singularity"
  ],
  "name": "kallisto",
  "ontology": [
      "http://edamontology.org/operation_3800",
      "http://edamontology.org/topic_3053"
  ],
  "owner": "sd2eadm",
  "parallelism": "SERIAL",
  "revision": 3,
  "shortDescription": "Quantifies RNA-Seq transcripts",
  "tags": [
      "RNA-Seq",
      "Quantification",
      "Abundance",
      "docker://index.docker.io/sd2e/kallisto:0.43.1--hdf51.8.17_0"
  ],
  "templatePath": "runner-template.sh",
  "testPath": "tester.sh",
  "uuid": "1631007230668172825-242ac115-0001-005",
  "version": "0.43.1"

Some key information in this section includes the identity of the HPC system (executionSystem) on which the app runs, which is hpc-tacc-wrangler. Also, the longDescription suggests that the function of this particular app is to quantify abundances of transcripts from RNA-Seq data.


#### The `inputs` section

The inputs section describe what input data are required for running this app:

  "inputs": [
      {
          "details": {
              "argument": null,
              "description": "FASTA file of transcript sequences",
              "label": "transcripts",
              "repeatArgument": false,
              "showArgument": false
          },
          "id": "transcripts",
          "semantics": {
              "fileTypes": [],
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "http://edamontology.org/format_1929"
              ]
          },
          "value": {
              "default": "",
              "enquote": false,
              "order": 0,
              "required": true,
              "validator": null,
              "visible": true
          }
      },
      {
          "details": {
              "argument": null,
              "description": null,
              "label": "First reads file of a FASTQ pair (or single)",
              "repeatArgument": false,
              "showArgument": false
          },
          "id": "fastq1",
          "semantics": {
              "fileTypes": [],
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "http://edamontology.org/format_1930"
              ]
          },
          "value": {
              "default": "read1.fastq",
              "enquote": false,
              "order": 0,
              "required": true,
              "validator": null,
              "visible": true
          }
      },
      {
          "details": {
              "argument": null,
              "description": null,
              "label": "Second reads file of a FASTQ pair",
              "repeatArgument": false,
              "showArgument": false
          },
          "id": "fastq2",
          "semantics": {
              "fileTypes": [],
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "hhttp://edamontology.org/format_1930"
              ]
          },
          "value": {
              "default": "",
              "enquote": false,
              "order": 0,
              "required": true,
              "validator": null,
              "visible": true
          }
      }
  ],

This app is expecting three files as inputs, identified by the id field: transcripts, fastq1, and fastq2. Each has a short description, and all three are required ("required": true) to run this app.


#### The `parameters` section

Next, inspect the parameters section of the app json file:

  "parameters": [
      {
          "details": {
              "argument": "-b ",
              "description": "Number of bootstrap samples (default: 0)",
              "label": "bootstrap",
              "repeatArgument": false,
              "showArgument": true
          },
          "id": "bootstrap",
          "semantics": {
              "maxCardinality": 1,
              "minCardinality": 0,
              "ontology": [
                  "xs:integer"
              ]
          },
          "value": {
              "default": 0,
              "enquote": false,
              "order": 0,
              "required": false,
              "type": "number",
              "validator": "",
              "visible": true
          }
      },
      {
          "details": {
              "argument": "--seed=",
              "description": "Seed for the bootstrap sampling (default: 42)",
              "label": "seed",
              "repeatArgument": false,
              "showArgument": true
          },
          "id": "seed",
          "semantics": {
              "maxCardinality": 1,
              "minCardinality": 0,
              "ontology": [
                  "xs:integer"
              ]
          },
          "value": {
              "default": 42,
              "enquote": false,
              "order": 0,
              "required": false,
              "type": "number",
              "validator": "",
              "visible": true
          }
      },
      {
          "details": {
              "argument": "-o ",
              "description": "Output directory name",
              "label": "output",
              "repeatArgument": false,
              "showArgument": true
          },
          "id": "output",
          "semantics": {
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "xs:string"
              ]
          },
          "value": {
              "default": "output",
              "enquote": false,
              "order": 1,
              "required": true,
              "type": "string",
              "validator": "",
              "visible": true
          }
      }
  ],

It is also expecting three parameters, identified by the id field: bootstrap, seed, and output. This time, only the output parameter is requried. The purpose and acceptable values for each parameter are provided in the description.


#### The `outputs` section

The last remaining section is the outputs section:

  "outputs": [
      {
          "details": {
              "description": null,
              "label": "Abundances tabular file"
          },
          "id": "abundance",
          "semantics": {
              "fileTypes": [],
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "http://edamontology.org/format_3475"
              ]
          },
          "value": {
              "default": "output/abundance.tsv",
              "order": 0,
              "validator": null
          }
      },
      {
          "details": {
              "description": null,
              "label": "Metadata about the Kallisto run"
          },
          "id": "run_info",
          "semantics": {
              "fileTypes": [],
              "maxCardinality": 1,
              "minCardinality": 1,
              "ontology": [
                  "http://edamontology.org/format_3464"
              ]
          },
          "value": {
              "default": "output/run_info.json",
              "order": 0,
              "validator": null
          }
      }
  ],

Here, we see two outputs are created when running this app: abundance and run_info. Neither must be specified by the user, as they are output by default. However, this is an important clue (along with the inputs, parameters, and app description) to indicate what is the function of this app.

Once the appropriate app has been identified, the next steps are to stage the input data, and to prepare and submit a job.


Return to the API Documentation Overview