Skip to content

Commit

Permalink
Allow samples to be merged (#159)
Browse files Browse the repository at this point in the history
* allow samples to be merged

* remove alignment workflow

* update usage

* Update CHANGELOG.md

* Update CHANGELOG.md

* prettier

* remove unused module

* Update test since split_fastq changed

* avoid bam converted files name collision

* Update conf/modules/general.config

Co-authored-by: Anders Jemt <[email protected]>

* Apply review suggestions

* use 4-vCPU runners

---------

Co-authored-by: Anders Jemt <[email protected]>
  • Loading branch information
fellen31 and jemten authored May 29, 2024
1 parent 5f05740 commit afe5ce8
Show file tree
Hide file tree
Showing 48 changed files with 1,661 additions and 434 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
matrix:
parameters:
- ""
- "--input https://raw.githubusercontent.com/genomic-medicine-sweden/test-datasets/nallo/testdata/samplesheet_multisample_bam.csv --split_fastq 250 --parallel_snv 1 --phaser hiphase_sv"
- "--input https://raw.githubusercontent.com/genomic-medicine-sweden/test-datasets/nallo/testdata/samplesheet_multisample_bam.csv --split_fastq 2 --parallel_snv 1 --phaser hiphase_sv"
NXF_VER:
- "23.04.0"
- "latest-everything"
Expand Down
13 changes: 10 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [#148](https://github.com/genomic-medicine-sweden/nallo/pull/148) - Automatically infer sex if unknown
- [#148](https://github.com/genomic-medicine-sweden/nallo/pull/148) - Added read group tag to aligned BAM
- [#159](https://github.com/genomic-medicine-sweden/nallo/pull/159) - Allow files with from the same sample to be merged

### `Changed`

Expand All @@ -17,16 +18,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#151](https://github.com/genomic-medicine-sweden/nallo/pull/151) - Cleaned up TRGT output directory
- [#152](https://github.com/genomic-medicine-sweden/nallo/pull/152) - Use prefix in modkit module. Bgzip, index and split outputs into phased/unphased directories
- [#153](https://github.com/genomic-medicine-sweden/nallo/pull/153) - Changed cramino module to use prefix, renamed and moved all cramino outputs into `qc_alinged_reads/cramino/`
- [#159](https://github.com/genomic-medicine-sweden/nallo/pull/159) - Clarify the trio-binning genome assembly workflow
- [#159](https://github.com/genomic-medicine-sweden/nallo/pull/159) - `split_fastq` now splits on files instead of lines
- [#159](https://github.com/genomic-medicine-sweden/nallo/pull/159) - Use groupKey to remove bottleneck, where previously all samples had to wait before progressing after alignment

### `Fixed`

- [#156](https://github.com/genomic-medicine-sweden/nallo/pull/156) - Fixed program versions missing in output and MultiQC report

### Parameters

| Old parameter | New parameter |
| ------------- | ------------------ |
| | `--somalier_sites` |
| Old parameter | New parameter |
| --------------- | ------------------ |
| | `--somalier_sites` |
| `--split_fastq` | `--split_fastq` \* |

`split_fastq` now splits the input files into _n_ files (range 2-999)

> [!NOTE]
> Parameter has been updated if both old and new parameter information is present.
Expand Down
72 changes: 0 additions & 72 deletions conf/modules/align_reads.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,76 +18,4 @@ process {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

withName: '.*:ALIGN_READS:.*' {
publishDir = [
enabled: false,
]
}

withName: '.*:ALIGN_READS:FASTP' {
ext.args = "--disable_adapter_trimming --disable_quality_filtering --split_by_lines ${params.split_fastq * 4}"
}

withName: '.*:ALIGN_READS:MINIMAP2_ALIGN_UNSPLIT' {
if(params.preset == 'revio' | params.preset == 'pacbio') {
ext.args = { [
"-y",
"-x map-hifi",
"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }
} else if(params.preset == 'ONT_R10') {
ext.args = { [
"-y",
"-x map-ont",
"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }
}

publishDir = [
path: { "${params.outdir}/aligned_reads/minimap2/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*:ALIGN_READS:SAMTOOLS_CAT_SORT_INDEX' {
// Will go to same directory regardless if split or not
publishDir = [
path: { "${params.outdir}/aligned_reads/minimap2/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*:ALIGN_READS:MINIMAP2_ALIGN_SPLIT' {
if(params.preset == 'revio' | params.preset == 'pacbio') {
ext.args = { [
"-y",
"-x map-hifi",
"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }
} else if(params.preset == 'ONT_R10') {
ext.args = { [
"-y",
"-x map-ont",
"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }
}
}

withName: '.*:ALIGN_READS:SAMTOOLS_INDEX_MINIMAP2_ALIGN' {
publishDir = [
path: { "${params.outdir}/aligned_reads/minimap2/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}
1 change: 1 addition & 0 deletions conf/modules/bam_to_fastq.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,6 @@ process {
ext.args = '-x SA' // samtools reset
ext.args2 = '-T \\*' // samtools fastq

ext.prefix = { "${input}" }
}
}
58 changes: 58 additions & 0 deletions conf/modules/general.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ process {

withName: '.*:NALLO:FASTQC' {
ext.args = '--quiet'

ext.pref = { "${reads}" }

publishDir = [
path: { "${params.outdir}/qc_raw_reads/fastqc/${meta.id}" },
mode: params.publish_dir_mode,
Expand All @@ -34,6 +37,9 @@ process {
}

withName: '.*:NALLO:FQCRS' {

ext.pref = { "${reads}" }

publishDir = [
path: { "${params.outdir}/qc_raw_reads/fqcrs/${meta.id}" },
mode: params.publish_dir_mode,
Expand All @@ -53,6 +59,58 @@ process {
]
}

withName: '.*:NALLO:FASTP' {

ext.prefix = { "${reads.simpleName}" }

ext.args = { [
'--disable_adapter_trimming',
'--disable_quality_filtering',
"--split ${params.split_fastq}"
].join(' ').trim() }

publishDir = [
enabled: false
]
}

withName: '.*:NALLO:MINIMAP2_ALIGN' {

ext.prefix = { "${reads}" }
ext.args2 = '--write-index'

ext.args = { [
"-y",
params.preset.equals('ONT_R10') ? "-x map-ont" : "-x map-hifi",
"--secondary=no",
"-Y",
"-R @RG\\\\tID:${meta.id}\\\\tSM:${meta.id}"
].join(' ') }

publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/aligned_reads/minimap2/" },
// only a single BAM file per sample
saveAs: {
if (meta.n_files == 1) {
"${meta.id}/${it}"
} else { null }
}
]
}

withName: '.*:NALLO:SAMTOOLS_MERGE' {

ext.args = '--write-index'

publishDir = [
path: { "${params.outdir}/aligned_reads/minimap2/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Summary
Expand Down
18 changes: 18 additions & 0 deletions conf/modules/genome_assembly.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ process {
]
}

withName: '.*:ASSEMBLY:YAK_PATERNAL' {

ext.prefix = { "${meta.paternal_id}_yak" }

publishDir = [
enabled: false,
]
}

withName: '.*:ASSEMBLY:YAK_MATERNAL' {

ext.prefix = { "${meta.maternal_id}_yak" }

publishDir = [
enabled: false,
]
}

withName: '.*:ASSEMBLY:GFASTATS.*' {
ext.args = '--discover-paths'
publishDir = [
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ params {
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_cpus = 4
max_memory = '6.GB'
max_time = '6.h'

Expand Down
4 changes: 2 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ HG01125,/path/to/HG01125.g.vcf.gz

- By default SNV-calling is split into 13 parallel processes, limit this by setting `--parallel_snv` to a different number.

- By default the pipeline does not perform parallel alignment, but this can be set by setting `--split_fastq` to split alignment into N reads per process.
- By default the pipeline does not perform parallel alignment, but this can be set by setting `--split_fastq` to split the input and alignment into N files/processes.

All parameters are listed below:

Expand Down Expand Up @@ -245,7 +245,7 @@ Less common options for the pipeline, typically set in a config file.
| `variant_caller` | Choose variant caller | `string` | deepvariant | | |
| `phaser` | Choose phasing software | `string` | whatshap | | |
| `hifiasm_mode` | Run hifiasm in hifi-only or hifi-trio mode | `string` | hifi-only | | |
| `split_fastq` | Split Alignment into n reads per job | `integer` | 0 | | |
| `split_fastq` | Split alignment into n jobs | `integer` | 0 | | |
| `parallel_snv` | Split SNV calling into n chunks | `integer` | 13 | | |

## Extra file inputs
Expand Down
19 changes: 15 additions & 4 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"cat/fastq": {
"branch": "master",
"git_sha": "4fc983ad0b30e6e32696fa7d980c76c7bfe1c03e",
"installed_by": ["modules"]
},
"deepvariant": {
"branch": "master",
"git_sha": "199ba086a259e1933d6e0ab7596e4a977bbd483a",
Expand Down Expand Up @@ -81,17 +86,18 @@
},
"hifiasm": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
"git_sha": "aecb06fcdb995ff3e3df7c7a1fd119367d6d1996",
"installed_by": ["modules"],
"patch": "modules/nf-core/hifiasm/hifiasm.diff"
},
"minimap2/align": {
"branch": "master",
"git_sha": "1a5a9e7b4009dcf34e6867dd1a5a1d9a718b027b",
"git_sha": "72e277acfd9e61a9f1368eafb4a9e83f5bcaa9f5",
"installed_by": ["modules"]
},
"minimap2/index": {
"branch": "master",
"git_sha": "73e062cf5e0ef346f65d73009b1b656299359fc5",
"git_sha": "72e277acfd9e61a9f1368eafb4a9e83f5bcaa9f5",
"installed_by": ["modules"]
},
"mosdepth": {
Expand Down Expand Up @@ -119,6 +125,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"samtools/merge": {
"branch": "master",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
"installed_by": ["modules"]
},
"samtools/sort": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
6 changes: 3 additions & 3 deletions modules/local/fqcrs.nf
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ process FQCRS {
}

input:
tuple val(meta), path(fastq)
tuple val(meta), path(reads)

output:
tuple val(meta), path("${fastq}.tsv.zst"), emit: fqc
tuple val(meta), path("${prefix}.tsv.zst"), emit: fqc
path "versions.yml" , emit: versions

when:
Expand All @@ -26,7 +26,7 @@ process FQCRS {
prefix = task.ext.prefix ?: "${meta.id}"

"""
zcat ${fastq} | fqcrs | zstd -c > ${fastq}.tsv.zst
zcat ${reads} | fqcrs | zstd -c > ${prefix}.tsv.zst
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
2 changes: 1 addition & 1 deletion modules/local/yak.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ process YAK {
-b37 \\
$args \\
-t $task.cpus \\
-o ${fasta.baseName}.yak \\
-o ${prefix}.yak \\
${fasta}
cat <<-END_VERSIONS > versions.yml
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/cat/fastq/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit afe5ce8

Please sign in to comment.