You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it would be useful to have a finer control over the nextflow behavior when comes to resuming operation. Specifically, I would like to be able to choose which steps should be recreated and which should be left as is. This is motivated by the size of the data some of the pipelines I work on process (e.g. 160 TB). The current nextflow strategy to redo everything if anything changes, be it the code or the input data, is safe and guarantees the integrity of the results. However, oftentimes the user is better positioned to make a decision what really needs to be rerun and how costly the alternative of rerunning everything is.
I would find useful to have something like:
# when resuming the operation, don't redo any files
nextflow run -resume -no-redo
# when resuming the operation, redo only files generated by step3 and subsequent
nextflow run -resume -redo step3
Also it would be helpful to be able to specify in the workflow which steps are temporary and should be cleaned from the hard drive after a specified process finishes. I've seen some solutions to do this (e.g. https://github.com/wtsi-hgi/nextflow_ci/blob/template/pipelines/main.nf#L54-L78), but they seemed to be working around nextflow limitations, rather than an integral solution. Ideally one would say this very simply, something like
# clean all directories generated by channel1 after channel2 is completed
channel1.clean_on_completion(channel2)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
it would be useful to have a finer control over the nextflow behavior when comes to resuming operation. Specifically, I would like to be able to choose which steps should be recreated and which should be left as is. This is motivated by the size of the data some of the pipelines I work on process (e.g. 160 TB). The current nextflow strategy to redo everything if anything changes, be it the code or the input data, is safe and guarantees the integrity of the results. However, oftentimes the user is better positioned to make a decision what really needs to be rerun and how costly the alternative of rerunning everything is.
I would find useful to have something like:
Also it would be helpful to be able to specify in the workflow which steps are temporary and should be cleaned from the hard drive after a specified process finishes. I've seen some solutions to do this (e.g. https://github.com/wtsi-hgi/nextflow_ci/blob/template/pipelines/main.nf#L54-L78), but they seemed to be working around nextflow limitations, rather than an integral solution. Ideally one would say this very simply, something like
Beta Was this translation helpful? Give feedback.
All reactions