Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate a DAG that can execute an arbitrary CWL workflow #92

Open
LucaCinquini opened this issue May 16, 2024 · 3 comments
Assignees
Labels

Comments

@LucaCinquini
Copy link
Collaborator

Once a user has created a CWL workflow and makes it available at some URL (for example, an Application Package available from Dockstore), we can imagine triggering the OGC register() method to automatically generate a DAG that is very similar to the current generic cwl_dag.py, but includes some customizations: the DAG name, id, and the specific parameters needed by the CWL workflow.

Start simple by generating the "Echo" SAG, which is able to execute this CWL workflow:

https://raw.githubusercontent.com/unity-sds/unity-sps-workflows/main/demos/echo_message.cwl

It should be very similar to this DAG: https://github.com/unity-sds/unity-sps/blob/develop/airflow/dags/cwl_dag.py but customized for the Echo use case.

We can explore either inheriting from a base CWL DAG, or generating the Echo DAG from scratch from a Template.

@GodwinShen
Copy link

@jpl-btlunsfo ping for status.

@LucaCinquini LucaCinquini moved this from Todo to In Progress in Unity Project Board Jun 3, 2024
@LucaCinquini
Copy link
Collaborator Author

@jpl-btlunsfo : I looked at the article you mentioned, that uses Python Dataclasses to automatically generate DAGs: https://medium.com/cts-technologies/designing-repeatable-dags-in-airflow-part-1-db3a72a2307d

Although it will work, it seems unnecessary complicated to me, and one disadvantage is that the DAGs are saved in the global() scope, and not written to the DAGs folder, which reduces visibility.

I am suggesting to use a simple approach, like the one outlined in this article:
https://www.astronomer.io/docs/learn/dynamically-generating-dags?tab=taskflow#example-use-a-create_dag-function

In particular, the second option: "Multiple Files Method". In summary, this would be implemented as follows:

o Create a file "include/dag_template.py" which dynamically creates a DAG based on some input parameters
o Implement the OGC register() method which, based on the input request, execute that function to create a file in the DAGs folder which replaces the dag_template.py variables with specific values (like, for example, the DAG name and the CWL file).

The above approach seems much simpler and easier to debug to me.

@mike-gangl
Copy link

As long as i can execute the CWL with an "arbitrary" json object or link to a json/yaml file like the cwl_dag we have, i'd be very happy in the near term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

4 participants