Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added descriptor_type_version to ToolVersion definition #219

Merged
merged 6 commits into from
Sep 14, 2022

Conversation

patmagee
Copy link
Contributor

What

I would like to propose adding a new field: descriptor_type_version to the ToolVersion object in order communicate the specific language level a particular descriptor_type is written in. The field is a simple map in the tool version, where the keys correspond to on of the descriptor_type's and the values are the actual underlying version.

Adding a new property does add a level of redundancy (ie reporting the descriptor_type twice). However, it does maintain backwards compatibility with the current spec version

{
  "descriptor_type": [
    "CWL",
    "WDL"
  ],
  "descriptor_type_version": {
    "CWL": "v1.0.2",
    "WDL": "1.0"
   }
}

Rationale

In WES there is a required field when submitting a new run: workflow_type_version. Workflow execution engines may handle run submissions differently depending on the reported language level. For example, an engine may only support version 1.0 of WDL or v1.0.2 of CWL. It may also use different workers depending on the language version specified by the user.

It would be my hope that we could relax the requirement in WES to require the worflow_type and workflow_type_version when a TRS uri is used. Unfortuantely, since TRS does not report the underlying version, we do not have the needed features to programatically look up this information without parsing the descriptor. Parsing the descriptor to get the version is not always possible as some languages do not report the version directly in their descriptor

Additionally, when building applications that interact with tools/tool versions, it would be helpful to have this information available to build version specific elements in UI's or other tooling

@@ -629,6 +629,19 @@ components:
description: The type (or types) of descriptors available.
items:
$ref: "#/components/schemas/DescriptorType"
descriptor_type_version:
type: object
Copy link
Member

@denis-yuen denis-yuen May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be of type array? Or rather a map with potential array values?
My understanding is that CWL workflows can have a mixture of tools of different cwl versions.
common-workflow-language/cwltool#1321

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denis-yuen that makes sense. I will switch this over to a an array on Monday.

One thing I did consider, was changing the existing descriptor_type to contain this information since it is essentially being duplicated. But I figured that would be a harder change to get into the spec then a new field

@denis-yuen denis-yuen requested a review from coverbeck May 13, 2022 17:14
{
"WDL": "1.0",
"CWL": "v1.0.2",
"NFL": "22.04.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know Nextflow, but looking at a random NF-Core workflow registered on Dockstore, it specifies a minimum Nextflow version. Then I found the Nextflow docs on it.

I'm personally more used to the NPM semantic versioning,

In any case, it seems like this case should be covered one way or another.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This seems a little different.
The version of the Nextflow jar that a workflow is compatible with is different from the language version.
Nextflow's equivalent of WDL draft-3, 1.0, etc. is DSL2 https://www.nextflow.io/docs/latest/dsl2.html

This does mean I would recommend adjusting the example though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to hear something about the relationship between workflow languages/types, workflow language/type versions and required workflow engine versions from the developers of workflow languages for which only a single engine exists. Particularly for Snakemake, I don't think there is such a thing as a workflow language version (but I may be wrong). That would help us to better map workflows to WES instances and increase the chance that a workflow will actually be processed successfully.

I suppose the most interesting question would probably be:

  • Do Nextflow/Snakemake workflows require specific workflow engine versions?
  • Do Nextflow/Snakemake workflows require specific workflow language versions? Is there a concept of language versions at all in Snakemake?
  • Are Nextflow and Snakemake workflow engines strictly backwards compatible in their ability to process older workflows / workflow languages?

Perhaps @johanneskoester and @pditommaso would like to comment?

Copy link
Collaborator

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from input by workflow (language/engine) devs, there are a couple of minor wording issues. Otherwise it looks good to me. Very useful.

descriptor_type_version:
type: object
description: A map providing information about the language versions used in this tool. The keys should be the
same values in the `descriptor_type` table, and the value should be an array of the the actual language version for the given
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] actual language versions (plural)

@@ -713,6 +728,11 @@ components:
- WDL
- NFL
- GALAXY
DescriptorTypeVersion:
type: string
description: The language version this tool is written in for a given descriptor type. The versions should correspond
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using singular and plural interchangeably here. Would be good to clarify. I think it would also make sense to add a sentence explaining if/how/when multiple versions should be indicated (although that should probably go in the description of the object or array, not the atomic unit here in this schema.

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this is fine from my end, but worth pinging notifications

@denis-yuen
Copy link
Member

Note, this handles one part of #210

@denis-yuen denis-yuen added this to the v2.0.1 milestone Jun 14, 2022
Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from cloud workstream meeting, mostly positive, can be merged with some wording changes

@denis-yuen
Copy link
Member

Note versioning convention, looks like we use the version number as-is without modification from the language itself

@denis-yuen
Copy link
Member

Discussion from GA4GH call: what would break if we only reported the version from the root descriptor?

@denis-yuen
Copy link
Member

note: there may be a PR in WES that deals with versioning ga4gh/workflow-execution-service-schemas#181

@vsmalladi
Copy link

I think as long as you use the version from the tool itself there should be no issues.

Not all workflows and tools will have backwards compatibility so there might be a minimum version if met would confer that a tool could work. But if we match exact I think this should be fine.

@denis-yuen denis-yuen changed the base branch from develop to release/v2.0.1 August 24, 2022 15:48
Copy link
Collaborator

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me and addresses an important use case. I do feel though that here may be a good opportunity to also address workflow types for which versions are not available, e.g., Snakemake (where workflow engine versions are used instead). There is a related PR in WES (ga4gh/workflow-execution-service-schemas#182), and I would ideally like to see how negotiations to use a specific WES for a given workflow could/would play out for the different workflow types based on TRS URIs and the changes proposed here and in ga4gh/workflow-execution-service-schemas#182.

Of course, this could also be addressed in future PRs, but there may be the risk that implementing these use cases might then require rolling back some changes or introducing breaking changes. However, if others feel that this risk is small/acceptable, from my side, the PR can be merged as is.

@denis-yuen
Copy link
Member

I think the risk of rolling back changes is somewhat mitigated since this is marked as an optional field and not a required field. Pinging @patmagee for a final call, we should probably decide whether this is in or out for 2.0.1 in the next couple of days.

Copy link
Contributor

@coverbeck coverbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nit-picky suggestions.

It seems like a next step could be to surface this at the file level as well, although WES may not need it. And that gets trickier, because the file FileWrapper also describes non-workflow descriptors. But it feels incomplete, that the version says it has versions x and y, but there is no way to know which descriptors are which versions.

Just thinking out loud, and this shouldn't block the PR.

openapi/openapi.yaml Outdated Show resolved Hide resolved
openapi/openapi.yaml Show resolved Hide resolved
openapi/openapi.yaml Outdated Show resolved Hide resolved
@denis-yuen denis-yuen dismissed uniqueg’s stale review September 14, 2022 14:51

"risk is small/acceptable, from my side, the PR can be merged as is."

@denis-yuen denis-yuen merged commit fcf3c1f into ga4gh:release/v2.0.1 Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants