Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new troubleshooting page on why a process is retrying #2732

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ parentWeight: 20
1. [Unable to acquire lock error](/docs/usage/troubleshooting/aquire_lock_error)
1. [Docker permission errors](/docs/usage/troubleshooting/docker_permissions)
1. [IPv6 network errors](/docs/usage/troubleshooting/ipv6)
1. [Processes are retrying](/docs/usage/troubleshooting/retries.md)

## How to use these pages

Expand Down
15 changes: 15 additions & 0 deletions sites/docs/src/content/docs/usage/troubleshooting/retries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Processes are retrying

### Why did processes report an error but then retry?

One of the nice things about Nextflow is it offers the ability to retry processes if they encounter an error and fail with certain bash [exit status](https://en.wikipedia.org/wiki/Exit_status) or codes.
Some of these errors have common causes, which allows us to provide solutions to these problem.

A common issue is a tool requiring more memory than is (initially) made available to it based on the default memory specifications set in the pipeline.
Such errors (out of memory, or OOM) are often identified by exit code `104`, and those falling between `130` - `145`.

Therefore all nf-core pipelines [by default will retry](https://github.com/nf-core/tools/blob/930ece572bf23b68c7a7c5259e918a878ba6499e/nf_core/pipeline-template/conf/base.config#L18) a process if it hits one of those exit codes, but requesting more resources (memory, CPUs, and time) for the re-submitted job.

All other exit codes will cause the pipeline to fail immediately, and will not be retried.

However some pipelines may extend this list, or provide different retry conditions, based on the behaviour of the specific tools in the used in the pipeline.
Loading