Skip to content

Commit

Permalink
Annotate repeats per family
Browse files Browse the repository at this point in the history
  • Loading branch information
fellen31 committed Nov 5, 2024
1 parent c47f69a commit 1607f42
Show file tree
Hide file tree
Showing 14 changed files with 146 additions and 124 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#479](https://github.com/genomic-medicine-sweden/nallo/pull/479) - Replaced bgzip tabix with bcftools sort in rank variants to fix [#457](https://github.com/genomic-medicine-sweden/nallo/issues/457)
- [#480](https://github.com/genomic-medicine-sweden/nallo/pull/480) - Updated ranking of SVs to work with multiple families per project
- [#484](https://github.com/genomic-medicine-sweden/nallo/pull/484) - Updated metro map and added SVG version
- [#485](https://github.com/genomic-medicine-sweden/nallo/pull/485) - Updated repeat expansion annotation to annotate per family instead of per sample
- [#487](https://github.com/genomic-medicine-sweden/nallo/pull/487) - Changed CI tests to only run tests where changes have been made

### `Removed`
Expand Down
4 changes: 2 additions & 2 deletions conf/modules/annotate_repeat_expansions.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ process {
}

withName: '.*:ANNOTATE_REPEAT_EXPANSIONS:COMPRESS_STRANGER' {
ext.prefix = { "${meta.id}_repeat_expansion_stranger" }
ext.prefix = { "${meta.id}_repeats_annotated" }
ext.args = [
'--output-type z',
'--write-index=tbi'
].join(' ')
publishDir = [
path: { "${params.outdir}/repeat_annotation/stranger/${meta.id}" },
path: { "${params.outdir}/repeats/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand Down
11 changes: 6 additions & 5 deletions conf/modules/call_repeat_expansions.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ process {
withName: '.*:CALL_REPEAT_EXPANSIONS:SAMTOOLS_SORT_TRGT' {
ext.prefix = { "${meta.id}_spanning_sorted" }
publishDir = [
path: { "${params.outdir}/repeat_calling/trgt/single_sample/${meta.id}" },
path: { "${params.outdir}/repeats/sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*:CALL_REPEAT_EXPANSIONS:SAMTOOLS_INDEX_TRGT' {
publishDir = [
path: { "${params.outdir}/repeat_calling/trgt/single_sample/${meta.id}" },
path: { "${params.outdir}/repeats/sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand All @@ -52,22 +52,23 @@ process {
'--write-index=tbi'
].join(' ')
publishDir = [
path: { "${params.outdir}/repeat_calling/trgt/single_sample/${meta.id}" },
path: { "${params.outdir}/repeats/sample/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*:CALL_REPEAT_EXPANSIONS:BCFTOOLS_MERGE' {
ext.prefix = { "${meta.id}_repeats" }
ext.args = [
'--output-type z',
'--write-index=tbi',
'--force-single'
].join(' ')
publishDir = [
path: { "${params.outdir}/repeat_calling/trgt/multi_sample/${meta.id}" },
path: { "${params.outdir}/repeats/family/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
saveAs: { filename -> filename.equals('versions.yml') || !params.skip_repeat_annotation ? null : filename }
]
}

Expand Down
38 changes: 21 additions & 17 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,23 +157,27 @@ If the pipeline is run with phasing, the aligned reads will be happlotagged usin

### Repeats

[TRGT](https://github.com/PacificBiosciences/trgt) is used to call repeats:

| Path | Description |
| --------------------------------------------------------- | ----------------------------------------- |
| `repeat_calling/trgt/multi_sample/{project}/*.vcf.gz` | Merged VCF file for all samples |
| `repeat_calling/trgt/multi_sample/{project}/*.vcf.gz.tbi` | Index of the VCF file |
| `repeat_calling/trgt/single_sample/{sample}/*.vcf.gz` | VCF file with called repeats for a sample |
| `repeat_calling/trgt/single_sample/{sample}/*.vcf.gz.tbi` | Index of the VCF file |
| `repeat_calling/trgt/single_sample/{sample}/*.bam` | BAM file with sorted spanning reads |
| `repeat_calling/trgt/single_sample/{sample}/*.bai` | Index of the BAM file |

[Stranger](https://github.com/Clinical-Genomics/stranger) is used to annotate them:

| Path | Description |
| -------------------------------------------------- | ------------------------------- |
| `repeat_annotation/stranger/{sample}/*.vcf.gz` | Annotated VCF file |
| `repeat_annotation/stranger/{sample}/*.vcf.gz.tbi` | Index of the annotated VCF file |
[TRGT](https://github.com/PacificBiosciences/trgt) is used to call repeats.

!!!note

Merged variants per family are only output without annotation if `--skip_repeat_annotation` is true. Variants per sample are always output without annotation.

| Path | Description |
| ------------------------------------------------------- | ----------------------------------------- |
| `repeats/{family}/{family}_repeat_expansions.vcf.gz` | Merged VCF file per family |
| `repeats/{family}/{family_repeat_expansions.vcf.gz.tbi` | Index of the VCF file |
| `repeats/sample/{sample}/*.vcf.gz` | VCF file with called repeats for a sample |
| `repeats/sample/{sample}/*.vcf.gz.tbi` | Index of the VCF file |
| `repeats/sample/{sample}/*.bam` | BAM file with sorted spanning reads |
| `repeats/sample/{sample}/*.bai` | Index of the BAM file |

[Stranger](https://github.com/Clinical-Genomics/stranger) is used to annotate repeats.

| Path | Description |
| --------------------------------------------------------------------------- | ------------------------------------- |
| `repeat_expansions/{family}/{family}_repeat_expansions_annotated.vcf.gz` | Merged, annotated VCF file per family |
| `repeat_expansions/{family}/{family_repeat_expansions_annotated.vcf.gz.tbi` | Index of the VCF file |

### SNVs

Expand Down
3 changes: 2 additions & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,8 @@
"stranger": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
"installed_by": ["modules"]
"installed_by": ["modules"],
"patch": "modules/nf-core/stranger/stranger.diff"
},
"svdb/merge": {
"branch": "master",
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/stranger/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/stranger/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions modules/nf-core/stranger/stranger.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions subworkflows/local/call_repeat_expansions/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ workflow CALL_REPEAT_EXPANSIONS {

BCFTOOLS_SORT_TRGT.out.vcf
.join( BCFTOOLS_SORT_TRGT.out.tbi )
.map { meta, bcf, csi -> [ [ id : meta.project ], bcf, csi ] }
.map { meta, bcf, csi -> [ [ id : meta.family_id ], bcf, csi ] }
.groupTuple()
.set{ ch_bcftools_merge_in }

Expand All @@ -66,7 +66,7 @@ workflow CALL_REPEAT_EXPANSIONS {

emit:
sample_vcf = BCFTOOLS_SORT_TRGT.out.vcf // channel: [ val(meta), path(vcf) ]
project_vcf = BCFTOOLS_MERGE.out.vcf // channel: [ val(meta), path(vcf) ]
family_vcf = BCFTOOLS_MERGE.out.vcf // channel: [ val(meta), path(vcf) ]
sample_bam = SAMTOOLS_SORT_TRGT.out.bam // channel: [ val(meta), path(bam) ]
sample_bai = SAMTOOLS_INDEX_TRGT.out.bai // channel: [ val(meta), path(bai) ]
versions = ch_versions // channel: [ versions.yml ]
Expand Down
6 changes: 3 additions & 3 deletions subworkflows/local/call_repeat_expansions/tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ nextflow_workflow {
workflow {
"""
input[0] = Channel.of([
[ id:'test', single_end:false, project: 'project', sex: 1 ], // meta map
[ id:'test', single_end:false, family_id: 'family', sex: 1 ], // meta map
file(params.pipelines_testdata_base_path + 'testdata/HG002_PacBio_Revio.bam', checkIfExists: true),
file(params.pipelines_testdata_base_path + 'testdata/HG002_PacBio_Revio.bam.bai', checkIfExists: true)
])
Expand All @@ -57,7 +57,7 @@ nextflow_workflow {
{ assert workflow.out.sample_bai.get(0).get(1).endsWith(".bai") },
{ assert snapshot(
path(workflow.out.sample_vcf.get(0).get(1)).vcf.variantsMD5,
path(workflow.out.project_vcf.get(0).get(1)).vcf.variantsMD5,
path(workflow.out.family_vcf.get(0).get(1)).vcf.variantsMD5,
bam(workflow.out.sample_bam.get(0).get(1), stringency: 'silent').getReadsMD5(),
workflow.out.versions,
).match() }
Expand All @@ -74,7 +74,7 @@ nextflow_workflow {
workflow {
"""
input[0] = Channel.of([
[ id:'test', single_end:false, project: 'project', sex: 1 ], // meta map
[ id:'test', single_end:false, family_id: 'family', sex: 1 ], // meta map
file(params.pipelines_testdata_base_path + 'testdata/HG002_PacBio_Revio.bam', checkIfExists: true),
file(params.pipelines_testdata_base_path + 'testdata/HG002_PacBio_Revio.bam.bai', checkIfExists: true)
])
Expand Down
24 changes: 12 additions & 12 deletions subworkflows/local/call_repeat_expansions/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
Expand All @@ -35,17 +35,17 @@
"1": [
[
{
"id": "project"
"id": "family"
},
"project.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
"family.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
]
],
"2": [
[
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
Expand All @@ -56,7 +56,7 @@
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e"
Expand All @@ -69,20 +69,20 @@
"versions.yml:md5,8a4b29c3089d4b00cfe6c5c39b88d1ab",
"versions.yml:md5,b9424dde80b33e84164cc956a14aa459"
],
"project_vcf": [
"family_vcf": [
[
{
"id": "project"
"id": "family"
},
"project.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
"family.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
]
],
"sample_bai": [
[
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.bam.bai:md5,d41d8cd98f00b204e9800998ecf8427e"
Expand All @@ -93,7 +93,7 @@
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.bam:md5,d41d8cd98f00b204e9800998ecf8427e"
Expand All @@ -104,7 +104,7 @@
{
"id": "test",
"single_end": false,
"project": "project",
"family_id": "family",
"sex": 1
},
"test.vcf.gz:md5,68b329da9893e34099c7d8ad5cb9c940"
Expand All @@ -123,6 +123,6 @@
"nf-test": "0.9.0",
"nextflow": "24.04.4"
},
"timestamp": "2024-10-30T11:00:04.845039812"
"timestamp": "2024-11-04T16:43:06.104050126"
}
}
Loading

0 comments on commit 1607f42

Please sign in to comment.