Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added indexcov : finding large INDEL using the BAI index #1613

Open
wants to merge 22 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- [1613](https://github.com/nf-core/sarek/pull/1613) - add indexcov
- [1640](https://github.com/nf-core/sarek/pull/1620) - Add `lofreq` as a tumor-only variant caller
- [1642](https://github.com/nf-core/sarek/pull/1642) - Back to dev
- [1653](https://github.com/nf-core/sarek/pull/1653) - Updates `sarek_subway` files with `lofreq`
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Depending on the options and samples provided, the pipeline can currently perfor
- `freebayes`
- `GATK HaplotypeCaller`
- `Manta`
- `indexcov`
- `mpileup`
- `MSIsensor-pro`
- `Mutect2`
Expand Down Expand Up @@ -172,6 +173,7 @@ We thank the following people for their extensive assistance in the development
- [pallolason](https://github.com/pallolason)
- [Paul Cantalupo](https://github.com/pcantalupo)
- [Phil Ewels](https://github.com/ewels)
- [Pierre Lindenbaum](https://github.com/lindenb)
- [Sabrina Krakau](https://github.com/skrakau)
- [Sam Minot](https://github.com/sminot)
- [Sebastian-D](https://github.com/Sebastian-D)
Expand Down
21 changes: 21 additions & 0 deletions conf/modules/indexcov.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

// INDEXCOV

process {
if (params.tools && params.tools.split(',').contains('indexcov')) {

withName: 'SAMTOOLS_REINDEX_BAM' {
ext.args = { ' -F 3844 -q 30 ' } // high mapq , primary read paired properly mapped
}

withName: 'GOLEFT_INDEXCOV' {
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/indexcov/" }
]

}

}

}
25 changes: 25 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Strelka](#strelka)
- [Lofreq](#lofreq)
- [Structural Variants](#structural-variants)
- [Indexcov](#indexcov)
- [Manta](#manta)
- [TIDDIT](#tiddit)
- [Sample heterogeneity, ploidy and CNVs](#sample-heterogeneity-ploidy-and-cnvs)
Expand Down Expand Up @@ -592,6 +593,30 @@ For further downstream analysis, take a look [here](https://github.com/Illumina/

### Structural Variants

#### indexcov

[indexcov](https://github.com/brentp/goleft/tree/master/indexcov) quickly estimate coverage from a whole-genome bam or cram index.
A bam index has 16KB resolution and it is used as a coverage estimate .
The output is scaled to around 1. So a long stretch with values of 1.5 would be a heterozygous duplication. This is useful as a quick QC to get coverage values across the genome.

**Output directory: `{outdir}/variantcalling/indexcov/`**

In addition to the interactive HTML files, `indexcov` outputs a number of text files:

- `<sample>-indexcov.ped`: a .ped/.fam file with the inferred sex in the appropriate column if the sex chromosomes were found.
the CNX and CNY columns indicating the floating-point estimate of copy-number for those chromosomes.
`bins.out`: how many bins had a coverage value outside of (0.85, 1.15). high values can indicate high-bias samples.
`bins.lo`: number of bins with value < 0.15. high values indicate missing data.
`bins.hi`: number of bins with value > 1.15.
`bins.in`: number of bins with value inside of (0.85, 1.15)
`p.out`: `bins.out/bins.in`
`PC1...PC5`: PCA projections calculated with depth of autosomes.

- `<sample>-indexcov.roc`: tab-delimited columns of chrom, scaled coverage cutoff, and $n_samples columns where each indicates the
proportion of 16KB blocks at or above that scaled coverage value.
- `<sample>-indexcov.bed.gz`: a bed file with columns of chrom, start, end, and a column per sample where the values indicate there
scaled coverage for that sample in that 16KB chunk.

#### Manta

[Manta](https://github.com/Illumina/manta) calls structural variants (SVs) and indels from mapped paired-end sequencing reads.
Expand Down
50 changes: 26 additions & 24 deletions docs/usage.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"goleft/indexcov": {
"branch": "master",
"git_sha": "a941aa24517960d7b9eeab4c3a5adfb3f70a5e4b",
"installed_by": ["modules"]
},
"lofreq/callparallel": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
7 changes: 7 additions & 0 deletions modules/local/samtools/reindex_bam/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: samtools_view
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::samtools=1.20
- bioconda::htslib=1.20
57 changes: 57 additions & 0 deletions modules/local/samtools/reindex_bam/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/**
* The aim of this process is to re-index the bam file without the duplicate, supplementary, unmapped etc, for goleft/indexcov
* It creates a BAM containing only a header (so indexcov can get the sample name) and a BAM index were low quality reads, supplementary etc, have been removed
*/
process SAMTOOLS_REINDEX_BAM {
tag "$meta.id"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.20--h50ea8bc_0' :
'biocontainers/samtools:1.20--h50ea8bc_0' }"

input:
tuple val(meta), path(input), path(input_index)
tuple val(meta2), path(fasta)
tuple val(meta3), path(fai)

output:
tuple val(meta), path("${meta.id}.reindex.bam"), path("${meta.id}.reindex.bam.bai"),emit: output
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def reference = fasta ? "--reference ${fasta}" : ""
"""
# write header only
samtools \\
view \\
--header-only \\
--threads ${task.cpus} \\
-O BAM \\
-o "${meta.id}.reindex.bam" \\
${reference} \\
${input}

# write BAM index only, remove unmapped, supplementary, etc...
samtools \\
view \\
--uncompressed \\
--write-index \\
--threads ${task.cpus} \\
-O BAM \\
-o "/dev/null##idx##${meta.id}.reindex.bam.bai" \\
${reference} \\
${args} \\
${input}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/nf-core/goleft/indexcov/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

57 changes: 57 additions & 0 deletions modules/nf-core/goleft/indexcov/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 52 additions & 0 deletions modules/nf-core/goleft/indexcov/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

114 changes: 114 additions & 0 deletions modules/nf-core/goleft/indexcov/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading