Skip to content

Commit

Permalink
Merge pull request #519 from nf-core/bouncy-basenji
Browse files Browse the repository at this point in the history
Bouncy basenji pre-release PR
  • Loading branch information
LilyAnderssonLee authored Sep 11, 2024
2 parents 5e0d556 + cc34d41 commit b63da73
Show file tree
Hide file tree
Showing 329 changed files with 15,411 additions and 1,131 deletions.
6 changes: 4 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,10 @@ jobs:
if [[ "${{ matrix.tags }}" == "test_motus" ]]; then
wget https://raw.githubusercontent.com/motu-tool/mOTUs/master/motus/downloadDB.py
python downloadDB.py --no-download-progress
echo 'tool,db_name,db_params,db_path' > 'database_motus.csv'
echo "motus,db_mOTU,,db_mOTU" >> 'database_motus.csv'
echo 'tool,db_name,db_params,db_type,db_path' > 'database_motus.csv'
echo "motus,db1_mOTU,,short,db_mOTU" >> 'database_motus.csv'
echo "motus,db2_mOTU,,long,db_mOTU" >> 'database_motus.csv'
echo "motus,db3_mOTU,,short;long,db_mOTU" >> 'database_motus.csv'
nextflow run ${GITHUB_WORKSPACE} -profile docker,${{ matrix.tags }} --databases ./database_motus.csv --outdir ./results_${{ matrix.tags }};
else
nextflow run ${GITHUB_WORKSPACE} -profile docker,${{ matrix.tags }} --outdir ./results_${{ matrix.tags }};
Expand Down
28 changes: 27 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,42 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## dev - [unreleased]
## v1.2dev - Bouncy Basenji [unreleased]

### `Added`

- [#417](https://github.com/nf-core/taxprofiler/pull/417) - Added reference-free metagenome estimation with Nonpareil (added by @jfy133)
- [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets now require a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) - Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)
- [#512](https://github.com/nf-core/taxprofiler/pull/512) - Update all tools to the latest version and include nf-test (Updated by @LilyAnderssonLee & @jfy133)

### `Fixed`

- [#518](https://github.com/nf-core/taxprofiler/pull/518) Fixed a bug where Oxford Nanopore FASTA input files would not be processed (❤️ to @ikarls for reporting, fixed by @jfy133)

### `Dependencies`

| Tool | Previous version | New version |
| ------------- | ---------------- | ----------- |
| bbmap | 39.01 | 39.06 |
| bowtie2 | 2.4.4 | 2.5.2 |
| bracken | 2.7 | 2.9 |
| cat/fastq | 8.30 |
| diamond | 2.0.15 | 2.1.8 |
| ganon | 1.5.1 | 2.0.0 |
| kraken2 | 2.1.2 | 2.1.3 |
| krona | 2.8 | 2.8.1 |
| megan | 6.24.20 | 6.25.9 |
| metaphlan | 4.0.6 | 4.1.1 |
| minimap2 | 2.24 | 2.28 |
| motus/profile | 3.0.3 | 3.1.0 |
| multiqc | 1.21 | 1.24.1 |
| nanoq | | 0.10.0 |
| samtools | 1.17 | 1.20 |
| untar | 4.7 | 4.8 |

### `Deprecated`

## v1.1.8 - Augmented Akita Patch [2024-06-20]
Expand Down
12 changes: 12 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,26 @@

> Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. https://doi.org/10.1186/s13104-016-1900-2
- [Nonpareil](https://doi.org/10.1128/mSystems.00039-18)

- Rodriguez-R, L. M., Gunturu, S., Tiedje, J. M., Cole, J. R., & Konstantinidis, K. T. (2018). Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity. mSystems, 3(3). https://doi.org/10.1128/mSystems.00039-18

- [Porechop](https://github.com/rrwick/Porechop)

> Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics, 3(10), e000132. https://doi.org/10.1099/mgen.0.000132
- [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI)

> Bonenfant, Q., Noé, L., & Touzet, H. (2023). Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. Bioinformatics Advances, 3(1):vbac085. https://10.1093/bioadv/vbac085
- [Filtlong](https://github.com/rrwick/Filtlong)

> Wick R (2021) Filtlong, URL: https://github.com/rrwick/Filtlong
- [nanoq](https://github.com/esteinig/nanoq)

> Steinig, E., & Coin, L. (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69). https://doi.org/10.21105/joss.02991
- [BBTools](http://sourceforge.net/projects/bbmap/)

> Bushnell B. (2022) BBMap, URL: http://sourceforge.net/projects/bbmap/
Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,21 @@

**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun short- and long-read metagenomic data. It allows for in-parallel taxonomic identification of reads or taxonomic abundance estimation with multiple classification and profiling tools against multiple databases, and produces standardised output tables for facilitating results comparison between different tools and databases.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scnanoseq/results).

## Pipeline summary

![](docs/images/taxprofiler_tube.png)

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) or [`falco`](https://github.com/smithlabcode/falco) as an alternative option)
2. Performs optional read pre-processing
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop))
- Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong))
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop), [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI))
- Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong)), [Nanoq](https://github.com/esteinig/nanoq)
- Host-read removal (short-read: [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/); long-read: [Minimap2](https://github.com/lh3/minimap2))
- Run merging
3. Supports statistics for host-read removal ([Samtools](http://www.htslib.org/))
3. Supports statistics metagenome coverage estimation ([Nonpareil](https://nonpareil.readthedocs.io/en/latest/)) and for host-read removal ([Samtools](http://www.htslib.org/))
4. Performs taxonomic classification and/or profiling using one or more of:
- [Kraken2](https://ccb.jhu.edu/software/kraken2/)
- [MetaPhlAn](https://huttenhower.sph.harvard.edu/metaphlan/)
Expand Down
148 changes: 119 additions & 29 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,48 @@ report_section_order:
order: -1001
"nf-core-taxprofiler-summary":
order: -1002
general_stats":
order: 1000
fastqc:
order: 900
fastqc-1:
order: 800
fastp:
order: 700
adapterRemoval:
order: 600
nonpareil:
order: 500
porechop:
order: 400
porechop_abi:
order: 450
bbduk:
order: 300
prinseqplusplus:
order: 200
filtlong:
order: 100
nanoq:
order: 95
bowtie2:
order: 90
samtools:
order: 80
kraken:
order: 70
bracken:
order: 60
centrifuge:
order: 50
malt:
order: 40
diamond:
order: 30
kaiju:
order: 20
motus:
order: 10

export_plots: true

Expand All @@ -22,11 +64,13 @@ custom_logo_title: "nf-core/taxprofiler"
run_modules:
- fastqc
- adapterRemoval
- fastp
- fastp
- nonpareil
- bbduk
- prinseqplusplus
- porechop
- filtlong
- nanoq
- bowtie2
- minimap2
- samtools
Expand All @@ -44,6 +88,8 @@ sp:
fn_re: ".*(fastqc|falco)_data.txt$"
fastqc/zip:
fn: "*_fastqc.zip"
nonpareil:
fn: "nonpareil_all_samples.json"

top_modules:
- "fastqc":
Expand All @@ -60,13 +106,23 @@ top_modules:
path_filters_exclude:
- "*raw*"
extra: "If used in this run, Falco is a drop-in replacement for FastQC producing the same output, written by Guilherme de Sena Brandine and Andrew D. Smith."
- "fastp"
- "adapterRemoval"
- nonpareil
- "porechop":
name: "Porechop"
anchor: "porechop"
target: "Porechop"
path_filters:
- "*porechop.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop did not detect any adapters and therefore no statistics generated."
- "bbduk"
- "prinseqplusplus"
- "filtlong"
- "porechop":
name: "Porechop_ABI"
anchor: "porechop_abi"
target: "Porechop_ABI"
doi: "10.1093/bioadv/vbac085"
info: "find and remove adapters from Oxford Nanopore reads."
path_filters:
- "*porechop_abi.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop_abi did not detect any adapters and therefore no statistics generated."
- "bowtie2":
name: "bowtie2"
- "samtools":
Expand Down Expand Up @@ -95,12 +151,11 @@ top_modules:
- "*.centrifuge.txt"
- "malt":
name: "MALT"
- "diamond"
- "kaiju":
name: "Kaiju"
- "motus"

#It is not possible to set placement for custom kraken and centrifuge columns.
# It is not possible to set placement for custom kraken
# and centrifuge columns.

table_columns_placement:
FastQC / Falco (pre-Trimming):
Expand Down Expand Up @@ -130,16 +185,32 @@ table_columns_placement:
percent_aligned: 370
percent_collapsed: 380
percent_discarded: 390
nonpareil:
nonpareil_R: 400
nonpareil_LR: 410
nonpareil_kappa: 420
nonpareil_C: 430
nonpareil_diversity: 440
Porechop:
Input Reads: 400
Start Trimmed: 410
Start Trimmed Percent: 420
End Trimmed: 430
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Input Reads: 500
Start Trimmed: 510
Start Trimmed Percent: 520
End Trimmed: 530
End Trimmed Percent: 540
Middle Split: 550
Middle Split Percent: 560
Porechop_ABI:
Input Reads: 500
Start Trimmed: 510
Start Trimmed Percent: 520
End Trimmed: 530
End Trimmed Percent: 540
Middle Split: 550
Middle Split Percent: 560
Filtlong:
Target bases: 500
Target bases: 600
nanoq:
Read N50: 700
BBDuk:
Input reads: 800
Total Removed bases percent: 810
Expand Down Expand Up @@ -203,6 +274,24 @@ table_columns_visible:
percent_duplicates: False
percent_gc: False
percent_fails: False
Adapter Removal:
aligned_total: True
percent_aligned: True
percent_collapsed: True
percent_discarded: False
fastp:
pct_adapter: True
pct_surviving: True
pct_duplication: False
after_filtering_gc_content: False
after_filtering_q30_rate: False
after_filtering_q30_bases: False
nonpareil:
nonpareil_R: false
nonpareil_LR: false
nonpareil_kappa: true
nonpareil_C: true
nonpareil_diversity: true
porechop:
Input reads: False
Start Trimmed:
Expand All @@ -211,20 +300,18 @@ table_columns_visible:
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
fastp:
pct_adapter: True
pct_surviving: True
pct_duplication: False
after_filtering_gc_content: False
after_filtering_q30_rate: False
after_filtering_q30_bases: False
porechop_abi:
Input reads: False
Start Trimmed:
Start Trimmed Percent: True
End Trimmed: False
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
Filtlong:
Target bases: True
Adapter Removal:
aligned_total: True
percent_aligned: True
percent_collapsed: True
percent_discarded: False
nanoq:
ReadN50: True
BBDuk:
Input reads: False
Total Removed bases Percent: False
Expand Down Expand Up @@ -276,6 +363,9 @@ extra_fn_clean_exts:
- ".bbduk"
- ".unmapped"
- "_filtered"
- "porechop"
- "porechop_abi"
- "_processed"
- type: remove
pattern: "_falco"

Expand Down
6 changes: 6 additions & 0 deletions assets/schema_database.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@
"errorMessage": "Invalid database db_params entry. No quotes allowed.",
"meta": ["db_params"]
},
"db_type": {
"type": "string",
"enum": ["short", "long", "short;long"],
"default": "short;long",
"meta": ["db_type"]
},
"db_path": {
"type": "string",
"exists": true,
Expand Down
3 changes: 3 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,21 @@
"type": "string",
"format": "file-path",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"unique": true,
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"type": "string",
"format": "file-path",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"unique": true,
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'. If not applicable, leave it empty."
},
"fasta": {
"type": "string",
"format": "file-path",
"pattern": "^\\S+\\.(f(ast)?q|fa(sta)?)\\.gz$",
"unique": true,
"errorMessage": "FastA file must be provided, cannot contain spaces and must have extension '.fa.gz' or '.fasta.gz'. If not applicable, leave it empty."
}
},
Expand Down
Loading

0 comments on commit b63da73

Please sign in to comment.