Skip to content

WORKFLOW: Metagenomics

Rauf Salamzade edited this page Dec 22, 2020 · 1 revision

The Metagenomics workflow can process and QC metagenomics or metatranscriptomics datasets and offers options for extended runtime and resources to handle the large datasets.

Parameter Identifier Parameter Value Type / Default Parameter Description
run_subsample String/Boolean. False Specify whether to perform subsampling of reads (e.g. to quickly get a glimpse of the data).
read_subsampling Integer. 10000 The number of reads to subsample.
run_centrifuge String/Boolean. True Whether to run Centrifuge.
centrifuge_index String. Path to the Centrifuge database index.
centrifuge_threads Integer. 1 The number of cores/threads to provide for Centrifuge.
centrifuge_memory Integer. 32 The memory (in Gb per core/thread) to provide for Centrifuge.
trimgalore_options String. Options for TrimGalore for adapter trimming of FASTQs.
run_kneaddata String/Boolean. True Run kneaddata QC, including Trimmomatic quality trimming.
kneaddata_options String. --reference-db /path/to/Homo_sapiens Options to run kneaddata with. Currently used to specify path to reference database to be used for running contamination removal of host DNA.
kneaddata_threads Integer. 1 The number of cores/threads to provide for kneaddata.
kneaddata_memory Integer. 32 The memory (in Gb per core/thread) to provide for kneaddata.
run_metaphlan2 String/Boolean. True Whether to run MetaPhlAn2 profiling.
metaphlan2_threads Integer. 1 The number of cores/threads to provide for MetaPhlAn2.
run_sortmerna String/Boolean. False Whether to run SortMeRNA for rRNA depletion (for meta-transcriptomic datasets).
sortmerna_db String. Path to directory with SortMeRNA ribosomal RNA database files.
sortmerna_threads Integer. 4 The number of cores/threads to provide for SortMeRNA.
sortmerna_memory Integer. 16 The memory (in Gb per core/thread) to provide for SortMeRNA.
run_straingst String/Boolean. False Whether to run StrainGST analysis to find strains from a specific species.
straingst_db String. Path to StrainGST k-mer pan-genome database (*.hdf5 file).
run_shortbred String/Boolean. False Whether to run ShortBRED to find AMR genes in data.
amr_shortbred_markers String. Path to ShortBRED AMR markers database.
set_timelimit_sevendays String/Boolean. False Whether to change timelimit to 7 full days for intensive steps. Meant for large datasets.
run_cleanup String/Boolean. False Delete intermediate FASTQ files: True/False