Skip to content

How to Submit Data (API)

Rosi Bajari edited this page Aug 25, 2021 · 13 revisions

Submitting through the API

Data can be submitted through scripted methods to the API.

API Documentation

You can explore the data submission API through the Swagger API Docs.

Getting an Access token to submit data

In order to submit data to the API, you must have a token encoding access. To retrieve your personal token:

  1. Login to the Virus-Seq portal and click the Profile and Token page from the top right user navigation bar.
  2. Click the Copy button next to your personal token.

!! NOTE !!

  • Your access token is associated with your user credentials and should NEVER be shared with anyone.
  • Your access token lasts only for 24 hours.

profile

Creating a submission

After preparing your data files, you can use the API and your personal token to format submission requests.

For example, using curl:

curl --location --request POST 'https://muse.virusseq-dataportal.ca/submissions' \
--header 'Authorization: Bearer <token goes here>' \
--form 'files=@"/path/to/fasta/file-or-files/L00212401.fasta"' \
--form 'files=@"/path/to/metadata/file/metadata.tsv"'

If your files were formatted correctly, you will receive a submission id in response:

{
    "submissionId": "a941f97f-6408-4886-b9ca-d852606e3072"
}

If there was an issue with the format of the files, or if your # of viral genomes in the metadata TSV does not match the number of viral genomes submitted in fasta files, then you will receive an error. For example:

{
    "status": "BAD_REQUEST",
    "message": "Headers are incorrect!",
    "errorInfo": {
        "unknownHeaders": [],
        "missingHeaders": [
            "GISAID accession",
            "diagnostic pcr Ct value null reason"
        ]
    }
}

Troubleshoot the issues with the file until the upload proceed. Common things to check include:

  • make sure all the required headers are present. The latest example TSV can be found here.
  • make sure the samples listed in the metadata file match the samples in the provided fasta

Checking submission progress

curl --location --request GET 'https://muse.virusseq-dataportal.ca/uploads?page=0&size=100&sortDirection=DESC&sortField=createdAt&submissionId={submission id goes here}' \
--header 'Authorization: Bearer <token goes here>'

For each viral genome that was included in the submission, you will see an object in an array. Each viral genome payload has a:

  • status
  • list of errors if any
  • a unique id called analysis-id

You can see an example of an ERROR payload below (something was wrong with the submitted data) versus a successful COMPLETE upload below.

    "data": [
        {
            "submissionId": "be938d36-3614-410c-baae-b514daf1c4ab",
            "studyId": "DRGN-INTL",
            "submitterSampleId": "DRGN_45596",
            "status": "ERROR",
            "originalFilePair": [
                "DRGNtest.fasta",
                "DRGN_metadata.tsv"
            ],
            "analysisId": null,
            "error": "400 BAD_REQUEST - [SubmitService::schema.violation] - #/host/host_age_unit: years is not a valid enum value"
        },
        {
            "submissionId": "be938d36-3614-410c-baae-b514daf1c4ab",
            "studyId": "DRGN-INTL",
            "submitterSampleId": "DRGN_45601",
            "status": "COMPLETE",
            "originalFilePair": [
                "DRGNtest.fasta",
                "DRGN_metadata.tsv"
            ],
            "analysisId": "610db281-393f-4c29-8db2-81393fcc29b0",
            "error": null
        }]

Notes about API Submission

  • If your token is expired, your submission will not work.
  • Only one TSV can be uploaded. One or more .fasta, .fa, or .gz file(s) can be uploaded.
  • Please limit individual submissions to 5000 samples or less.