Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve install, usage, and configuration guide #2677

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9aea2d3
Updated install guide
christopher-hakkaart Aug 5, 2024
f90edbc
Update installation guide page
christopher-hakkaart Aug 5, 2024
7f3adaf
Update installation guide page
christopher-hakkaart Aug 5, 2024
a56cfcf
Update install directory
christopher-hakkaart Aug 5, 2024
cf6a7e8
First reshuffle usage guide
christopher-hakkaart Aug 9, 2024
f6794d8
Fix snake_case
christopher-hakkaart Aug 9, 2024
352db78
Remove note box that feels intrusive
christopher-hakkaart Aug 9, 2024
17a9f80
Remove note box that feels intrusive
christopher-hakkaart Aug 9, 2024
5b7416a
Remove note box that feels intrusive
christopher-hakkaart Aug 9, 2024
6f00844
Apply suggestions
christopher-hakkaart Aug 12, 2024
fe1f555
Merge branch 'main' into install-guide
christopher-hakkaart Aug 12, 2024
8eb3ac3
Prettier
christopher-hakkaart Aug 12, 2024
2e2d7bc
Prettier x2
christopher-hakkaart Aug 12, 2024
5891a9a
Fix header
christopher-hakkaart Aug 13, 2024
d80eec7
Small improvemnts
christopher-hakkaart Aug 13, 2024
5c85cc9
Fix code blocks
christopher-hakkaart Aug 13, 2024
b3aace6
Fix rendering issue
jfy133 Sep 3, 2024
6316c31
Update sites/docs/src/content/docs/usage/quick_start/introduction.md
christopher-hakkaart Sep 4, 2024
1d53601
Apply suggestions from code review
christopher-hakkaart Sep 4, 2024
5f6e5b8
Apply suggestions from code review
christopher-hakkaart Sep 5, 2024
6aeb60d
Small edits
christopher-hakkaart Sep 5, 2024
8a62037
Merge branch 'install-guide' of https://github.com/nf-core/website in…
christopher-hakkaart Sep 5, 2024
405a139
Apply suggested changes from comments
christopher-hakkaart Sep 5, 2024
6dca224
Prettier
christopher-hakkaart Sep 5, 2024
45e621f
Merge branch 'main' into install-guide
christopher-hakkaart Sep 5, 2024
967fe5d
Rename usage guides and typos
christopher-hakkaart Sep 9, 2024
210d30c
Fix index
christopher-hakkaart Sep 9, 2024
d1ce24c
Fix index again
christopher-hakkaart Sep 9, 2024
bcfe614
Blend in #2537
christopher-hakkaart Sep 11, 2024
1d374e3
Merge branch 'main' into install-guide
christopher-hakkaart Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
---
title: nf-core Terminology
subtitle: Specification of the terms used in the nf-core community
shortTitle: nf-core terminology
title: Terminology
subtitle: nf-core terminology
---

The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components.
## Introduction

## Terminology
The features offered by Nextflow [DSL2](#domain-specific-language-dsl) can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology nf-core uses when referring to DSL2 components.

### Domain-Specific Language (DSL)

Expand Down
174 changes: 174 additions & 0 deletions sites/docs/src/content/docs/guides/configuration/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: Configuration
subtitle: How configure nf-core pipelines
shortTitle: Configuration options
weight: 1
parentWeight: 20
---

## Configure nf-core pipelines

Each nf-core pipeline comes with a set of “sensible defaults” for a "typical" analysis of a full size dataset.
While the defaults are a great place to start, you will certainly want to modify these to fit your own data and system requirements. For example, modifying a tool flag of compute resources allocated for a process.

When a pipeline is launched, Nextflow will look for config files in several locations.
As each source can contain conflicting settings, the sources are ranked to decide which settings to apply.

nf-core pipelines may utilize any of these configuration files.

Configuration sources are reported below and listed in order of priority:

1. Parameters specified on the command line (`--parameter`)
2. Parameters that are provided using the `-params-file` option
3. Config file that are provided using the `-c` option
4. The config file named `nextflow.config` in the current directory
5. The config file named `nextflow.config` in the pipeline project directory
6. The config file `$HOME/.nextflow/config`
7. Values defined within the pipeline script itself (e.g., `main.nf`)

While some of these files are already included in the nf-core pipeline repository (e.g., the `nextflow.config` file in the nf-core pipeline repository), some are automatically identified on your local system (e.g., the `nextflow.config` in the launch directory), and others are only included if they are specified using run options (e.g., `-params-file`, and `-c`).

:::warning
You should not clone and manually edit an nf-core pipeline. Manually edited nf-core pipelines cannot be updated to more recent versions of the pipeline without overwriting your changes. You also risk moving away from the canonical pipeline and losing reproducibility.
:::

### Parameters

Parameters are pipeline specific settings that can be used to customize the execution of a pipeline.

At the highest level, parameters can be customized using the command line. Any parameter can be configured on the command line by prefixing the parameter name with a double dash (--):

```bash
--<parameter>
```

Depending on the parameter type, you may be required to add additional information after your parameter flag.
For example, you would add string parameter after the parameter flag for the `nf-core/rnaseq` `--input` and `--output` parameters.

```bash
nextflow nf-core/rnaseq --input <path/to/input> --outdir <path/to/results>
```

Every nf-core pipeline has a full list of parameters on the nf-core website. You will be shown a description and the type of the parameter when viewing these parameters. Some parameters will also have additional text to help you understand how a parameter should be used. See the [parameters page of the nf-core rnaseq pipeline](https://nf-co.re/rnaseq/3.14.0/parameters/).

### Default configuration files

All parameters have a default configuration that is defined using the `nextflow.config` file in the root of the pipeline directory. Many parameters are set to `null` or `false` by default and are only activated by a profile or config file.

nf-core pipelines also include additional config files from the `conf/` folder of a pipeline repository. Each additional `.config` file contains categorized configuration information for your pipeline execution, some of which can be optionally included as profiles:

- `base.config`
- Included by the pipeline by default
- Generous resource allocations using labels
- Does not specify any method for software dependencies and expects software to be available (or specified elsewhere)
- `igenomes.config`
- Included by the pipeline by default
- Default configuration to access reference files stored on AWS iGenomes
- `modules.config`
- Included by the pipeline by default
- Module-specific configuration options (both mandatory and optional)
- `test.config`
- Only included if specified as a profile
- A configuration profile to test the pipeline with a small test dataset
- `test_full.config`
- Only included if specified as a profile
- A configuration profile to test the pipeline with a full-size test dataset

:::note
Some configuration files contain the definition of profiles that can be flexibly applied. For example, the `docker`, `singularity`, and `conda` profiles are defined in the `nextflow.config` file in the pipeline project directory. You should not need to manually edit any of these configuration files.
:::

Profiles are sets of configuration options that can be flexibly applied to a pipeline.
They are also commonly defined in the `nextflow.config` file in the root of the pipeline directory.

Profiles that come with nf-core pipelines can be broadly categorized into two groups:

- Software management profiles
- Profiles for the management of software dependencies using container or environment management tools, for example, `docker`, `singularity`, and `conda`.
- Test profiles
- Profiles to execute the pipeline with a standardized set of test data and parameters, for example, `test` and `test_full`.

nf-core pipelines are required to define software containers and environments that can be activated using profiles. Although it is possible to run the pipelines with software installed by other methods (e.g., environment modules or manual installation), using container technology is more sharable, convenient, and reproducible.

### Shared configuration files

nf-core pipelines can also load custom institutional profiles that have been submitted to the [nf-core config repository](https://github.com/nf-core/configs). At run time, nf-core pipelines will fetch these configuration profiles from the [nf-core config repository](https://github.com/nf-core/configs) and make them available.

For shared resources such as an HPC cluster, you may consider developing a shared institutional profile.

Follow [this tutorial](https://nf-co.re/docs/usage/tutorials/step_by_step_institutional_profile) to set up your own institutional profile.

### Custom parameter and configuration files

Nextflow will look for files that are external to the pipeline project directory. These files include:

- The config file `$HOME/.nextflow/config`
- A config file named `nextflow.config` in your current directory
- Custom configuration files specified using the command line
- A parameter file that is provided using the `-params-file` option
- A config file that are provided using the `-c` option

**Parameter file format**

Parameter files are `.json` files that can contain an unlimited number of parameters:

```json title="nf-params.json"
{
"<parameter1_name>": 1,
"<parameter2_name>": "<string>",
"<parameter3_name>": true
}
```

You can override default parameters by creating a `.json` file and passing it as a command-line argument using the `-param-file` option.

```bash
nextflow run nf-core/rnaseq -profile docker --input <path/to/input? --outdir <results> -param-file <path/to/nf-params.json>
```

**Configuration file format**

Configuration files are `.config` files that can contain various pipeline properties and can be passed to Nextflow using the `-c` option in your execution command:

```bash
nextflow run nf-core/rnaseq -profile docker --input <path/to/input> --outdir <results> -c <path/to/custom.config>
```

Custom configuration files are the same format as the configuration file included in the pipeline directory.

Configuration properties are organized into [scopes](https://www.nextflow.io/docs/latest/config.html#config-scopes) by dot prefixing the property names with a scope identifier or grouping the properties in the same scope using the curly brackets notation. For example:

```groovy
alpha.x = 1
alpha.y = 'string value'
```

Is equivalent to:

```groovy
alpha {
x = 1
y = 'string value'
}
```

[Scopes](https://www.nextflow.io/docs/latest/config.html#config-scopes) allow you to quickly configure settings required to deploy a pipeline on different infrastructure using different software management.

A common scenario is for users to write a custom configuration file specific to running a pipeline on their infrastructure.

:::warning
Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for tuning process resource specifications, other infrastructural tweaks (such as output directories), or module arguments (`args`).
:::

Multiple scopes can be included in the same `.config` file using a mix of dot prefixes and curly brackets.

```groovy
executor.name = "sge"

singularity {
enabled = true
autoMounts = true
}
```

See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-scopes) for a full list of scopes.
125 changes: 125 additions & 0 deletions sites/docs/src/content/docs/guides/configuration/modify_tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: Modifying pipelines
subtitle: Configure tool containers and arguments
shortTitle: Modifying pipelines
weight: 3
---

## Modifying tools

Each tool in an nf-core pipeline come preconfigured with a set arguments for an average user.
The arguments are a great place to start and have been tested as a part of the development process.
You normally can change the default settings using parameters using the double dash notation, e.g., `--input`.
However, you may want to modify these to fit your own purposes.

It is **very unlikely** that you will need to edit the pipeline code to configure a tool.

### Tool arguments

You may wish to understand which tool arguments a pipeline uses, update, or add additional arguments not currently supported by a pipeline.

You can sometimes find out what parameters are used by a tool in by checking the longer 'help' description of different pipeline parameters, e.g., by pressing the 'help' button next to [this parameter](https://nf-co.re/funcscan/1.0.1/parameters#annotation_bakta_mincontig) in [nf-core/funcscan](https://nf-co.re/funcscan).

There are two main places that a tool can have a tool argument specified:

- The process `script` block
- The `conf/modules.conf` file

Most arguments (both mandatory or optional) are defined in the `conf/modules.conf` file under the `ext.args` entry. Arguments that are defined in the `conf/modules.conf` file can be flexible modified using custom configuration files.

Arguments specified in `ext.args` are then inserted into the module itself via the `$args` variable in the module's bash code

For example, the `-n` parameter could be added to the `BOWTIE_BUILD` process:

```groovy
process {
withName: BOWTIE_BUILD {
ext.args = "-n 0.1"
}
```

Updated tools may come with major changes and may break a pipeline and/or create missing values in MultiQC version tables.

:::warning
Such changes come with no warranty or support by the the pipeline developers!
:::

### Changing tool versions

You can tell the pipeline to use a different container image within a config file and the `process` scope.

You then need to identify the `process` name and override the Nextflow `container` or `conda` definition using the `withName` process selector.

For example, the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline uses a tool called Pangolin that updates an internal database of COVID-19 lineages quite frequently.

To update the container specification, you can do the following steps:

1. Check the default version used by the pipeline in the module file for the tool under `modules/nf-core/` directory of the pipeline. For example, for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19)
2. Find the latest version of the Biocontainer available on [quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) for Docker or [Galaxy Project](https://depot.galaxyproject.org/singularity/) for Singularity
- Note the container version tag is identical for both container systems, but must include the 'build' ID (e.g.`--pyhdfd78af_1`)
3. Create the custom config accordingly:

- For Docker:

```groovy
process {
withName: PANGOLIN {
container = 'quay.io/biocontainers/pangolin:3.1.17--pyhdfd78af_1'
}
}
```

- For Singularity:

```groovy
process {
withName: PANGOLIN {
container = 'https://depot.galaxyproject.org/singularity/pangolin:3.1.17--pyhdfd78af_1'
}
}
```

- For Conda:

```groovy
process {
withName: PANGOLIN {
conda = 'bioconda::pangolin=3.1.17'
}
}
```

:::warning
Updated tools may come with major changes and may break a pipeline and/or create missing values in MultiQC version tables. Such changes come with no warranty or support by the the pipeline developers.
:::

### Docker registries

nf-core pipelines use `quay.io` as the default docker registry for Docker and Podman images.
When specifying a Docker container, it will pull the image from `quay.io` unless a full URI is specified. For example, if the process container is:

```bash
biocontainers/fastqc:0.11.7--4
```

The image will be pulled from quay.io by default, resulting in a full URI of:

```bash
quay.io/biocontainers/fastqc:0.11.7--4
```

If `docker.registry` is specified, it will be used first. For example, if the config value `docker.registry = 'public.ecr.aws'` is specified the image will be pulled from:

```bash
public.ecr.aws/biocontainers/fastqc:0.11.7--4
```

However, the `docker.registry` setting will be ignored if you specify a full URI:

```bash
docker.io/biocontainers/fastqc:v0.11.9_cv8
```

:::warning
Updated registries may come with unexpected changes and come with no warranty or support by the the pipeline developers.
:::
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Running offline
subtitle: Run nf-core pipelines offline
shortTitle: Running offline
weight: 4
---

## Running offline

When Nextflow is connected to the internet it will fetch nearly everything it needs to run a pipeline. Nextflow can also run analysis on an offline system that has no internet connection. However, there are a few extra steps that are required to get everything you will need locally.

To run a pipeline offline you will need three things:

- [Nextflow](#nextflow)
- [Pipeline assets](#pipeline-assets)
- [Reference genomes](#reference-genomes) _(if required)_

These will first need to be fetched on a system that _does_ have an internet connection and transferred to your offline system.

### Nextflow

You need to have Nextflow installed on your local system.
You can do this by installing Nextflow on a machine that _does_ have an internet connection and transferring to the offline system:

1. [Install Nextflow locally](/docs/usage/quick_start/installation.md)
:::warning
Do _not_ use the `-all` package, as this does not allow the use of custom plugins.
:::
2. Run a Nextflow pipeline locally so that Nextflow fetches the required plugins.
- It does not need to run to completion.
3. Copy the Nextflow executable and your `$HOME/.nextflow` folder to your offline environment
4. Specify the name and version each plugin that you downloaded in a local Nextflow configuration file
- This will prevent Nextflow from trying to download newer versions of plugins.
5. Set `export NXF_OFFLINE='true'` in your terminal
- To set this permanently, add this command to your shell configuration file (e.g., `~/.bashrc` or `~/.zshrc`)

### Pipeline assets

To run a pipeline offline, you next need the pipeline code, the software dependencies, and the shared nf-core/configs profiles.
We have created a helper tool as part of the _nf-core_ package to automate this for you.

On a computer with an internet connection, run `nf-core download <pipeline>` to download the pipeline and config profiles.
Add the argument `--container singularity` to also fetch the singularity container(s). Note that only singularity is supported.

The pipeline and requirements will be downloaded, configured with their relative paths, and packaged into a `.tar.gz` file by default.
This can then be transferred to your offline system and unpacked.

Inside, you will see directories called `workflow` (the pipeline files), `config` (a copy of [nf-core/configs](https://github.com/nf-core/configs)), and (if you used `--container singularity`) a directory called `singularity`.
The pipeline code is adjusted by the download tool to expect these relative paths, so as long as you keep them together it should work as is.

### Shared storage

If you are downloading _directly_ to the offline storage (e.g., a head node with internet access whilst compute nodes are offline), you can use the `--singularity-cache-only` option for `nf-core download` and set the `$NXF_SINGULARITY_CACHEDIR` environment variable.
This downloads the singularity images to the `$NXF_SINGULARITY_CACHEDIR` folder and does not copy them into the target downloaded pipeline folder.
This reduces total disk space usage and is faster.

See the [documentation for `nf-core download`](/docs/nf-core-tools/pipelines/download) for more information.

### Reference genomes

Some pipelines require reference genomes and have built-in integration with AWS-iGenomes.
If you wish to use these references, you must also download and transfer them to your offline system.

Follow the [reference genomes documentation](/docs/usage/reference_genomes/reference_genomes.md) to configure the base path for the references.

### Bytesize talk

Here is a recent bytesize talk explaining the necessary steps to run pipelines offline.

<!-- markdownlint-disable -->
<iframe width="560" height="315" src="https://www.youtube.com/embed/N1rRr4J0Lps" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<!-- markdownlint-restore -->
Loading
Loading