-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation to include db_type #535
Changes from 11 commits
f7222f6
2b8acee
d2f109a
c5a92e1
719ac9e
7719f2a
270c159
c974db6
9402b5a
1f6775c
9b64847
d00104a
80cdb0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -116,7 +116,9 @@ Databases can be supplied either in the form of a compressed `.tar.gz` archive o | |
nf-core/taxprofiler does not provide any databases by default, nor does it currently generate them for you. This must be performed manually by the user. See bottom of this section for more information of the expected database files, or the [building databases](usage/tutorials#retrieving-databases-or-building-custom-databases) tutorial. | ||
::: | ||
|
||
The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four column comma-separated sheet. | ||
The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four (or five) column comma-separated sheet. | ||
|
||
The optional `db_type` column allows to use specific database/parameters against specific data types. By specifying if a database is for short-or long-reads (or even both), the samples sequenced with Illumina are combined with the short-read databases and the samples sequenced with Nanopore are combined with long-read databases. If `db_type` is not provided, it is assumed the database and parameters are applicable for both short and long read data. | ||
|
||
:::warning | ||
To allow user freedom, nf-core/taxprofiler does not check for mandatory or the validity of non-file database parameters for correct execution of the tool - excluding options offered via pipeline level parameters! Please validate your database parameters (cross-referencing [parameters](https://nf-co.re/taxprofiler/parameters), and the given tool documentation) before submitting the database sheet! For example, if you don't use the default read length - Bracken will require `-r <read_length>` in the `db_params` column. | ||
|
@@ -127,17 +129,17 @@ An example database sheet can look as follows, where 7 tools are being used, and | |
`kraken2` will be run twice even though only having a single 'dedicated' database because specifying `bracken` implies first running `kraken2` on the `bracken` database, as required by `bracken`. | ||
|
||
```csv | ||
tool,db_name,db_params,db_path | ||
malt,malt85,-id 85,/<path>/<to>/malt/testdb-malt/ | ||
malt,malt95,-id 90,/<path>/<to>/malt/testdb-malt.tar.gz | ||
bracken,db1,;-r 150,/<path>/<to>/bracken/testdb-bracken.tar.gz | ||
kraken2,db2,--quick,/<path>/<to>/kraken2/testdb-kraken2.tar.gz | ||
krakenuniq,db3,,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz | ||
centrifuge,db1,,/<path>/<to>/centrifuge/minigut_cf.tar.gz | ||
metaphlan,db1,,/<path>/<to>/metaphlan/metaphlan_database/ | ||
motus,db_mOTU,,/<path>/<to>/motus/motus_database/ | ||
ganon,db1,,/<path>/<to>/ganon/test-db-ganon.tar.gz | ||
kmcp,db1,;-I 20,/<path>/<to>/kmcp/test-db-kmcp.tar.gz | ||
tool,db_name,db_params,db_type,db_path | ||
malt,malt85,-id 85,short,/<path>/<to>/malt/testdb-malt/ | ||
malt,malt95,-id 90,,/<path>/<to>/malt/testdb-malt.tar.gz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if the column is in, it has to be filled (@LilyAnderssonLee do you remember). If you want both you need short;long as befote. See my comment below about what I had actually meant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have updated the PR based on your comments. |
||
bracken,db1,;-r 150,short,/<path>/<to>/bracken/testdb-bracken.tar.gz | ||
kraken2,db2,--quick,short,/<path>/<to>/kraken2/testdb-kraken2.tar.gz | ||
krakenuniq,db3,,short;long,/<path>/<to>/krakenuniq/testdb-krakenuniq.tar.gz | ||
centrifuge,db1,,short,/<path>/<to>/centrifuge/minigut_cf.tar.gz | ||
metaphlan,db1,,short,/<path>/<to>/metaphlan/metaphlan_database/ | ||
motus,db_mOTU,,long,/<path>/<to>/motus/motus_database/ | ||
ganon,db1,,short,/<path>/<to>/ganon/test-db-ganon.tar.gz | ||
kmcp,db1,;-I 20,short,/<path>/<to>/kmcp/test-db-kmcp.tar.gz | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a second csv example block but without the db_type column (essentially the one from before you edited). Sorry this is what I meant before about having an example without this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I updated the PR with two example blocks. |
||
|
||
:::warning | ||
|
@@ -157,6 +159,7 @@ Column specifications are as follows: | |
| `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. | | ||
| `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. | | ||
| `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. For Bracken databases, must at a minimum contain a `;` separating Kraken2 from Bracken parameters. | | ||
| `db_type` | A column to distinguish between short- and long-read databases. If the column is empty, the pipeline will assume all databases (and their settings specified in `db_params`!) will be applicable for both short and long read data [optional]. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @LilyAnderssonLee what are the valid values ehre? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right. And the default is
sofstam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. | | ||
|
||
:::tip | ||
|
@@ -165,6 +168,8 @@ You can also specify the same database directory/file twice (ensuring unique `db | |
|
||
nf-core/taxprofiler will automatically decompress and extract any compressed archives for you. | ||
|
||
The optional `db_type` column enables the use of specific databases or parameters for different data types. By specifying if a database is for short-reads, long-reads, or both, Illumina samples are combined with short-read databases, while Nanopore samples are combined with long-read databases. | ||
|
||
:::tip | ||
Click the links in the list below for short quick-reference tutorials how to generate download 'pre-made' and/or custom databases for each tool. | ||
::: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Myabe give a second example without the
db_type
columnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this for malt.