Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way of knowing a tool is valid without actually parsing or running it #224

Open
patmagee opened this issue Aug 10, 2022 · 4 comments
Open

Comments

@patmagee
Copy link
Contributor

patmagee commented Aug 10, 2022

Problem

In the process of building a service that consumes WDL tools from a public TRS API, we realized that there was no mechanism in the TRS spec to detect whether a tool is "valid" according to the language specification that it is written or, or what execution engines it has previously been tested on. WE discovered that nearly 40% of the tools in the TRS repo did not even parse with MiniWDL making them unusable as a resource.

Because we were unable to rely on validity of the tools being specified by the TRS specification, we needed to implement a separate service entirely that exhaustively scrapes the TRS API, parsing and validating each tool and each tool version whenever a new tool is added, or the meta-version is updated. This is a large amount of duplication work which other members of the community will likely encounter when run

There were two classes of errors we discovered (both are valuable to report): Semantic errors and Structural / execution based errors. The semantic errors occur when the descriptor simply does not follow the defined language rules. This is hard to categorise as with WDL different engines have varying levels of strictness when it comes to what is "semantically correct". We chose to be spec compliant here and say tools that do not parse with miniwdl are semantically "incorrect".

The second class of errors (the structural / execution errors) occurred because some workflow engines interpret properties/config differently then others, OR the workflow itself (although semantically correct) simply failed reproducible for one reason or another

Both of these classes of errors can largely be mitigated by including some sort of "validity" property in the Tool version object in the TRS API.

Solutions

I think the simplest approach would be to have a valid field in the tool version with the simplest definition of this field being: "The workflow is semantically correct". We can probably say that if Any workflow engine is able to properly parse the workflow then it is "semantically correct", but of course I there is likely degrees of strictness here, and landing on a common definition may technically be hard in practice (really just for WDL).

The above fix would solve one class of errors, but it does not solve errors generated from different engines operating differently or the workflow just failing. I am not too sure how to represent this, but I think it would be very helpful to know what engines (and maybe even engine versions) a workflow has been successfully run against. Maybe we can add a more complex field like below? Also, I totally forgot that a single tool may be written in multiple descriptors, so we would some how need to account for that as well

{
  "validations": {
    "parses": true,
    "engines": [
      {
        "name": "cromwell",
        "version": "79",
        "status": "pass"
      },
      {
        "name": "miniwdl",
        "version": "1.5.2",
        "status": "fail"
      }
    ]
  }
}

A natural problem with the above suggestions is answering "Where do these validations come from". One approach is relying on the TRS implementation to test or parse every single workflow it serves. This places a fairly high burden on the TRS implementation and it is unlikely that they will have full coverage of all workflow engines / langauges. Another option would be allowing the users to self attest or provide some sort of "proof" to the validity of the workflows. To be honest, I am not sure what the right approach is here

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-60

@denis-yuen
Copy link
Member

denis-yuen commented Aug 10, 2022

This places a fairly high burden on the TRS implementation and it is unlikely that they will have full coverage of all workflow engines / langauges. Another option would be allowing the users to self attest or provide some sort of "proof" to the validity of the workflows. To be honest, I am not sure what the right approach is here

FWIW, I'm interested in both actually.

How useful is a best effort though?
i.e. if a workflow platform can skip/ignore invalid workflows from some TRS implementations, but validate workflows from TRS implementations that have not implemented validation, is that a win?

The second option touches a bit on #223 except for validation rather than usage. But one could consider validation almost a degenerate version case of running a workflow.

@uniqueg
Copy link
Collaborator

uniqueg commented Aug 10, 2022

I would love to see that functionality somewhere, but I strongly think that workflow validation would be overloading TRS. Every implementer would have to implement runtimes (WES comes to mind here) for running all of the workflows that they serve - and that's not just for workflow types (CWL, WDL, etc.) but engines, too. And as you have already suggested, @patmagee, it's really not necessarily clear what valid / invalid means. So - very very hard not only to implement, but also to standardize at the moment.

This rather strikes me as a good FASP project - perhaps a service that GA4GH could offer connecting TRS and WES (or TES, for containers) for this purpose. One could then maybe implement #223 in a way that not only the usage results are transmitted, but also the authority, together with more details like the the type of test (validation, provided test runs, works with WES etc.). This would allow TRS implementations to display something like badges for validated by GA4GH (and/or some other authority) next to more summarized usage stats of the general population.

It's a super important discussion that reminds me of another similar problem I thought about (will open an issue).

@patmagee
Copy link
Contributor Author

@uniqueg i think a derive provided by the ga4gh would be interesting. This could maybe be an cremation of the hypothetical test bed? Although, there certainly would be challenges to hosting such a platform in regards to cost.

I wonder if taking a narrow scope here and saying that a tool is considered valid if and only if it is semantically correct? That would be easily applicable for wdl, cwl, nextflow, and snakemake without applying too large a burden on the trs implementation.

For The other class of validations, I wonder if we can get creative. Why does the trs implementor or even the author need to provide this information? If we view trs as a community led initiative, then why not leverage the community. If we had a way for any user (ideally a signed in user with some sort of identity) to attest that they ran a specific version of a workflow on a specific engine and version, that would get us much farther ahead. This could even be done by platforms reading from external trs repos. This also opens the possibility of reporting back things like status, runtime, etc.

@uniqueg
Copy link
Collaborator

uniqueg commented Aug 11, 2022

@patmagee Yeah, that second part was what @denis-yuen and me were referring to, and I think there's a proposal/issue going in that direction in #223. I totally agree that crowd sourcing that information is the way to go.

As for syntax validation, I personally still think it's out of scope for TRS and would be better served by a centralized validator service, possibly a collection of dedicated WES instances? I don't really see why every service would need to implement that. Although of course the valid field could be made optional...

But let's see what other TRS implementers think.

@stain @jmfernandez

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants