From f7222f63594bb78a617d3d84359bad0797e5703f Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 10:21:59 +0200 Subject: [PATCH 01/12] Add db_type column to documentation --- docs/usage/tutorials.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index d7a489d7..8356ec23 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -90,23 +90,26 @@ ERX5474932,ERR5766176_B,ILLUMINA,ERX5474932_ERR5766176_B_1.fastq.gz,ERX5474932_E Here we have specified two libraries of the same sample, that they were sequencing on Illumina platforms, and the paths to the FASTQ files. If you had placed your FASTQ files elsewhere, you would give the full path (i.e., with relevant directories) to the `fastq_1`, `fastq_2` and `fasta` columns. + + #### Database sheet For the database(s), you also supply these via a `.csv` file. -This 4 column table contains the tool the database has been built for, a database name, the parameters you wish reads to be queried against the given database with, and a path to a `.tar.gz` archive file or a directory containing the database files. +This 5 column table contains the tool the database has been built for, a database name, the parameters you wish reads to be queried against the given database with, a column to distinguish between short- and long-read databases and a path to a `.tar.gz` archive file or a directory containing the database files. Open a text editor, and create a file called `database.csv`. Copy and paste the following csv file into the file and save it. ```csv title="database.csv" -tool,db_name,db_params,db_path -kraken2,db1,--quick,testdb-kraken2.tar.gz -centrifuge,db2,,test-db-centrifuge.tar.gz -centrifuge,db2_trimmed,--trim5 2 --trim3 2,test-db-centrifuge.tar.gz -kaiju,db3,,kaiju/ +tool,db_name,db_params,db_type,db_path +kraken2,db1,--quick,short,testdb-kraken2.tar.gz +centrifuge,db2,,short,test-db-centrifuge.tar.gz +centrifuge,db2_trimmed,--trim5 2 --trim3 2,long,test-db-centrifuge.tar.gz +kaiju,db3,,short;long,kaiju/ ``` You can see here we have specified the Centrifuge database twice, to allow comparison of different settings. +We have also specified different profiling parameters depending on whether a database is for short-read or long-read use. Note that the each database of the same tool has a unique name. Furthermore, while the Kraken2 and Centrifuge databases have been supplied as `.tar.gz` archives, the Kaiju database has been supplied as a directory. From 2b8acee830191ddc2b804a08b6dfb695eef492ec Mon Sep 17 00:00:00 2001 From: Sofia Stamouli <91951607+sofstam@users.noreply.github.com> Date: Thu, 26 Sep 2024 10:34:40 +0200 Subject: [PATCH 02/12] Update docs/usage/tutorials.md Co-authored-by: James A. Fellows Yates --- docs/usage/tutorials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index 8356ec23..d4683fb6 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -95,7 +95,7 @@ If you had placed your FASTQ files elsewhere, you would give the full path (i.e. #### Database sheet For the database(s), you also supply these via a `.csv` file. -This 5 column table contains the tool the database has been built for, a database name, the parameters you wish reads to be queried against the given database with, a column to distinguish between short- and long-read databases and a path to a `.tar.gz` archive file or a directory containing the database files. +This 4 (or 5) column table contains the tool the database has been built for, a database name, the parameters you wish reads to be queried against the given database with, an optional column to distinguish between short- and long-read databases, and a path to a `.tar.gz` archive file or a directory containing the database files. Open a text editor, and create a file called `database.csv`. Copy and paste the following csv file into the file and save it. From d2f109aaa7f8515600df8db3e7eaec98c40eddd1 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli <91951607+sofstam@users.noreply.github.com> Date: Thu, 26 Sep 2024 10:38:31 +0200 Subject: [PATCH 03/12] Update docs/usage/tutorials.md Co-authored-by: James A. Fellows Yates --- docs/usage/tutorials.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index d4683fb6..df2fc091 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -110,6 +110,7 @@ kaiju,db3,,short;long,kaiju/ You can see here we have specified the Centrifuge database twice, to allow comparison of different settings. We have also specified different profiling parameters depending on whether a database is for short-read or long-read use. +If we don't specify this, the pipeline will assume all databases (and their settings specifed in `db_params`!) will be applicable for both short and long read data. Note that the each database of the same tool has a unique name. Furthermore, while the Kraken2 and Centrifuge databases have been supplied as `.tar.gz` archives, the Kaiju database has been supplied as a directory. From c5a92e195446e28071ba53a6f05758917f437581 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 10:40:07 +0200 Subject: [PATCH 04/12] Prettier --- docs/usage/tutorials.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index df2fc091..b0cfd975 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -90,8 +90,6 @@ ERX5474932,ERR5766176_B,ILLUMINA,ERX5474932_ERR5766176_B_1.fastq.gz,ERX5474932_E Here we have specified two libraries of the same sample, that they were sequencing on Illumina platforms, and the paths to the FASTQ files. If you had placed your FASTQ files elsewhere, you would give the full path (i.e., with relevant directories) to the `fastq_1`, `fastq_2` and `fasta` columns. - - #### Database sheet For the database(s), you also supply these via a `.csv` file. From 719ac9ecb14f0876e8e6605e4616d056779afa33 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 11:08:32 +0200 Subject: [PATCH 05/12] Review suggestions --- docs/usage.md | 24 +++++++++++++----------- docs/usage/tutorials.md | 2 +- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 68a98788..c7a63a3d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -127,17 +127,17 @@ An example database sheet can look as follows, where 7 tools are being used, and `kraken2` will be run twice even though only having a single 'dedicated' database because specifying `bracken` implies first running `kraken2` on the `bracken` database, as required by `bracken`. ```csv -tool,db_name,db_params,db_path -malt,malt85,-id 85,///malt/testdb-malt/ -malt,malt95,-id 90,///malt/testdb-malt.tar.gz -bracken,db1,;-r 150,///bracken/testdb-bracken.tar.gz -kraken2,db2,--quick,///kraken2/testdb-kraken2.tar.gz -krakenuniq,db3,,///krakenuniq/testdb-krakenuniq.tar.gz -centrifuge,db1,,///centrifuge/minigut_cf.tar.gz -metaphlan,db1,,///metaphlan/metaphlan_database/ -motus,db_mOTU,,///motus/motus_database/ -ganon,db1,,///ganon/test-db-ganon.tar.gz -kmcp,db1,;-I 20,///kmcp/test-db-kmcp.tar.gz +tool,db_name,db_params,db_type,db_path +malt,malt85,-id 85,short,///malt/testdb-malt/ +malt,malt95,-id 90,short,///malt/testdb-malt.tar.gz +bracken,db1,;-r 150,short,///bracken/testdb-bracken.tar.gz +kraken2,db2,--quick,short,///kraken2/testdb-kraken2.tar.gz +krakenuniq,db3,,short;long,///krakenuniq/testdb-krakenuniq.tar.gz +centrifuge,db1,,short,///centrifuge/minigut_cf.tar.gz +metaphlan,db1,,short,///metaphlan/metaphlan_database/ +motus,db_mOTU,,long,///motus/motus_database/ +ganon,db1,,short,///ganon/test-db-ganon.tar.gz +kmcp,db1,;-I 20,short,///kmcp/test-db-kmcp.tar.gz ``` :::warning @@ -157,8 +157,10 @@ Column specifications are as follows: | `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. | | `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. | | `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. | +| `db_type` | A column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data [optional]. | `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. | + :::tip You can also specify the same database directory/file twice (ensuring unique `db_name`s) and specify different parameters for each database to compare the effect of different parameters during classification/profiling. ::: diff --git a/docs/usage/tutorials.md b/docs/usage/tutorials.md index b0cfd975..aa48a200 100644 --- a/docs/usage/tutorials.md +++ b/docs/usage/tutorials.md @@ -108,7 +108,7 @@ kaiju,db3,,short;long,kaiju/ You can see here we have specified the Centrifuge database twice, to allow comparison of different settings. We have also specified different profiling parameters depending on whether a database is for short-read or long-read use. -If we don't specify this, the pipeline will assume all databases (and their settings specifed in `db_params`!) will be applicable for both short and long read data. +If we don't specify this, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Note that the each database of the same tool has a unique name. Furthermore, while the Kraken2 and Centrifuge databases have been supplied as `.tar.gz` archives, the Kaiju database has been supplied as a directory. From 7719f2a11806964299ba93bc8f4400d4111edb57 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 11:11:36 +0200 Subject: [PATCH 06/12] Prettier --- docs/usage.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index c7a63a3d..77fc65b3 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -157,10 +157,9 @@ Column specifications are as follows: | `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. | | `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. | | `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. | -| `db_type` | A column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data [optional]. +| `db_type` | A column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data [optional]. | | `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. | - :::tip You can also specify the same database directory/file twice (ensuring unique `db_name`s) and specify different parameters for each database to compare the effect of different parameters during classification/profiling. ::: From 270c159e2a4b82cee000ffd7fb2d0d8e8fa01cf5 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 11:38:51 +0200 Subject: [PATCH 07/12] James suggestions --- docs/usage.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 77fc65b3..a7923957 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -116,7 +116,7 @@ Databases can be supplied either in the form of a compressed `.tar.gz` archive o nf-core/taxprofiler does not provide any databases by default, nor does it currently generate them for you. This must be performed manually by the user. See bottom of this section for more information of the expected database files, or the [building databases](usage/tutorials#retrieving-databases-or-building-custom-databases) tutorial. ::: -The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four column comma-separated sheet. +The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases abd the samples sequenced with Nanopore are combined with long-read databases. :::warning To allow user freedom, nf-core/taxprofiler does not check for mandatory or the validity of non-file database parameters for correct execution of the tool - excluding options offered via pipeline level parameters! Please validate your database parameters (cross-referencing [parameters](https://nf-co.re/taxprofiler/parameters), and the given tool documentation) before submitting the database sheet! For example, if you don't use the default read length - Bracken will require `-r ` in the `db_params` column. @@ -166,6 +166,8 @@ You can also specify the same database directory/file twice (ensuring unique `db nf-core/taxprofiler will automatically decompress and extract any compressed archives for you. +The optional `db_type` column enables the use of specific databases or parameters for different data types. By specifying if a database is for short-reads, long-reads, or both, Illumina samples are combined with short-read databases, while Nanopore samples are combined with long-read databases. + :::tip Click the links in the list below for short quick-reference tutorials how to generate download 'pre-made' and/or custom databases for each tool. ::: From c974db64e1102f12480a4343b156f7535814da24 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli <91951607+sofstam@users.noreply.github.com> Date: Thu, 26 Sep 2024 12:52:36 +0200 Subject: [PATCH 08/12] Update docs/usage.md Co-authored-by: James A. Fellows Yates --- docs/usage.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index a7923957..e5449cce 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -116,7 +116,9 @@ Databases can be supplied either in the form of a compressed `.tar.gz` archive o nf-core/taxprofiler does not provide any databases by default, nor does it currently generate them for you. This must be performed manually by the user. See bottom of this section for more information of the expected database files, or the [building databases](usage/tutorials#retrieving-databases-or-building-custom-databases) tutorial. ::: -The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases abd the samples sequenced with Nanopore are combined with long-read databases. +The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. + +The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short and long read data. :::warning To allow user freedom, nf-core/taxprofiler does not check for mandatory or the validity of non-file database parameters for correct execution of the tool - excluding options offered via pipeline level parameters! Please validate your database parameters (cross-referencing [parameters](https://nf-co.re/taxprofiler/parameters), and the given tool documentation) before submitting the database sheet! For example, if you don't use the default read length - Bracken will require `-r ` in the `db_params` column. From 9402b5a2abed5abeee1511b7697e640a034b797e Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 12:59:04 +0200 Subject: [PATCH 09/12] Add example of empty db_type --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index a7923957..ca06c2b8 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -129,7 +129,7 @@ An example database sheet can look as follows, where 7 tools are being used, and ```csv tool,db_name,db_params,db_type,db_path malt,malt85,-id 85,short,///malt/testdb-malt/ -malt,malt95,-id 90,short,///malt/testdb-malt.tar.gz +malt,malt95,-id 90,,///malt/testdb-malt.tar.gz bracken,db1,;-r 150,short,///bracken/testdb-bracken.tar.gz kraken2,db2,--quick,short,///kraken2/testdb-kraken2.tar.gz krakenuniq,db3,,short;long,///krakenuniq/testdb-krakenuniq.tar.gz From 9b64847bfa266eb59e6fedc968ef4a8697b5c1c3 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 13:14:38 +0200 Subject: [PATCH 10/12] Prettier --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index f5824424..7403e847 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -116,7 +116,7 @@ Databases can be supplied either in the form of a compressed `.tar.gz` archive o nf-core/taxprofiler does not provide any databases by default, nor does it currently generate them for you. This must be performed manually by the user. See bottom of this section for more information of the expected database files, or the [building databases](usage/tutorials#retrieving-databases-or-building-custom-databases) tutorial. ::: -The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. +The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short and long read data. From d00104afe45f228161f497f2a3db956e7939cb00 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli <91951607+sofstam@users.noreply.github.com> Date: Thu, 26 Sep 2024 13:34:22 +0200 Subject: [PATCH 11/12] Update docs/usage.md Co-authored-by: James A. Fellows Yates --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 7403e847..9e750c4d 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -159,7 +159,7 @@ Column specifications are as follows: | `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. | | `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. | | `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. | -| `db_type` | A column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data [optional]. | +| `db_type` | An optional column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Possible values: `long`, `short`, or `short;long` | | `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. | :::tip From 80cdb0c54b392a4fd9f4801b8a91bd0e95afbdd6 Mon Sep 17 00:00:00 2001 From: Sofia Stamouli Date: Thu, 26 Sep 2024 13:50:14 +0200 Subject: [PATCH 12/12] Review suggestions --- docs/usage.md | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 9e750c4d..2cf7ef79 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -118,20 +118,36 @@ nf-core/taxprofiler does not provide any databases by default, nor does it curre The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. -The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short and long read data. +The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short- and long-read data. :::warning To allow user freedom, nf-core/taxprofiler does not check for mandatory or the validity of non-file database parameters for correct execution of the tool - excluding options offered via pipeline level parameters! Please validate your database parameters (cross-referencing [parameters](https://nf-co.re/taxprofiler/parameters), and the given tool documentation) before submitting the database sheet! For example, if you don't use the default read length - Bracken will require `-r ` in the `db_params` column. ::: -An example database sheet can look as follows, where 7 tools are being used, and `malt` and `kraken2` will be used against two databases each. +An example database sheet can look as follows, where 7 tools are being used, and `malt` and `kraken2` will be used against two databases each. Since the `db_type` column is missing, it is therefore assumed that the database and parameters are suitable for both short- and long-read data. + +In the second example database sheet, the `db_type` column has been provided. The valid options are `short`, `long` and `short;long`. `kraken2` will be run twice even though only having a single 'dedicated' database because specifying `bracken` implies first running `kraken2` on the `bracken` database, as required by `bracken`. +```csv +tool,db_name,db_params,db_path +malt,malt85,-id 85,///malt/testdb-malt/ +malt,malt95,-id 90,///malt/testdb-malt.tar.gz +bracken,db1,;-r 150,///bracken/testdb-bracken.tar.gz +kraken2,db2,--quick,///kraken2/testdb-kraken2.tar.gz +krakenuniq,db3,,///krakenuniq/testdb-krakenuniq.tar.gz +centrifuge,db1,,///centrifuge/minigut_cf.tar.gz +metaphlan,db1,,///metaphlan/metaphlan_database/ +motus,db_mOTU,,///motus/motus_database/ +ganon,db1,,///ganon/test-db-ganon.tar.gz +kmcp,db1,;-I 20,///kmcp/test-db-kmcp.tar.gz +``` + ```csv tool,db_name,db_params,db_type,db_path malt,malt85,-id 85,short,///malt/testdb-malt/ -malt,malt95,-id 90,,///malt/testdb-malt.tar.gz +malt,malt95,-id 90,short,///malt/testdb-malt.tar.gz bracken,db1,;-r 150,short,///bracken/testdb-bracken.tar.gz kraken2,db2,--quick,short,///kraken2/testdb-kraken2.tar.gz krakenuniq,db3,,short;long,///krakenuniq/testdb-krakenuniq.tar.gz @@ -159,7 +175,7 @@ Column specifications are as follows: | `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. | | `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. | | `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. | -| `db_type` | An optional column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Possible values: `long`, `short`, or `short;long` | +| `db_type` | An optional column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data. Possible values: `long`, `short`, or `short;long`. If the `db_type` column is missing from the database.csv, it will take the default value short;long | | `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. | :::tip