You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: CoA uses TES as a submodule. Please consider creating this issue in the TES repository if you identify the root cause is there: https://github.com/microsoft/ga4gh-tes
Problem:
The new(ish) deployment model of CoA on Kubernetes is really great when you already have a presence on AKS, or are planning on using AKS for other reasons. Unfortunately, if you do not and deploying CoA is your first use case for AKS, it becomes a very expensive option for hosting a Workflow Execution server. Unless you are running large batches of workflows, the cost of the infrastructure drives up the cost per workflow substantially.
Using numbers pulled from running instances of both the new and the old deployment models, you can see the cost impact
Single VM Approach: ~ $275 USD / month
AKS Approach: ~ $620 / month
Non AKS Resources: ~ $185 / Month
AKS Resources: ~ $425 / Month
This represents greater then a 2x increase in cost for hosting the same services. Of course, I agree that the new approach is much more flexible, easier to maintain and troubleshoot and provides a host of other benefits. But in some scenarios it ends up being just a numbers game and $7500 / year is a lot for just hosting the infrastructure needed to run workflows on Azure
Solution:
There are a number of ways to solve this which require varying degrees of engineering and first class support from Azure
Provide a single VM option once again and allow the user to tune the VM size / DB size. This is probably the simplest short term solution. Some workloads are small and there is little justification for having such a large infrastructure for running < 10 (or even 100) workflows a month
Use a serverless execution engine, ie Cromwell with the run command or miniwdl deployed in a similar way as nextflow and then deploy TES on a small VM
Make TES a first class Microsoft API and deploy a small VM to house Cromwell
Make TES a first class Microsoft API and use a serverless execution engine
The text was updated successfully, but these errors were encountered:
@patmagee just had a good team discussion on this. The quickest, most impactful solution to this might be:
Modify the Trigger Engine to stop AKS if:
There are no "new" or "inprogress" workflows AND no workflows have completed within the past 1 hour (configurable). This would ensure that AKS is shut down when there aren't any workflows running or workflows that have completed within 1h+.
Create an Azure Function that uses a blob trigger so it's executed when a new blob is created in the workflows container.
It should check if AKS is stopped, and if so, it should start it.
This should hypothetically reduce the cost of AKS significantly, with the only downside being that cold start will likely take an additional few minutes, which seems like a perfectly fine tradeoff.
To go even further, we could also move the Postgres database to optionally be deployed as a container in AKS instead of the managed Azure Postgres Flexible server.
Note: CoA uses TES as a submodule. Please consider creating this issue in the TES repository if you identify the root cause is there: https://github.com/microsoft/ga4gh-tes
Problem:
The new(ish) deployment model of CoA on Kubernetes is really great when you already have a presence on AKS, or are planning on using AKS for other reasons. Unfortunately, if you do not and deploying CoA is your first use case for AKS, it becomes a very expensive option for hosting a Workflow Execution server. Unless you are running large batches of workflows, the cost of the infrastructure drives up the cost per workflow substantially.
Using numbers pulled from running instances of both the new and the old deployment models, you can see the cost impact
This represents greater then a 2x increase in cost for hosting the same services. Of course, I agree that the new approach is much more flexible, easier to maintain and troubleshoot and provides a host of other benefits. But in some scenarios it ends up being just a numbers game and $7500 / year is a lot for just hosting the infrastructure needed to run workflows on Azure
Solution:
There are a number of ways to solve this which require varying degrees of engineering and first class support from Azure
run
command orminiwdl
deployed in a similar way as nextflow and then deploy TES on a small VMThe text was updated successfully, but these errors were encountered: