Skip to content

Commit

Permalink
compare VCFs
Browse files Browse the repository at this point in the history
  • Loading branch information
tobiasrausch committed Mar 4, 2024
1 parent c4ba5e3 commit e252e9e
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 5 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ Sansa has several subcommands

`sansa markdup` to [mark duplicate SV sites](https://github.com/dellytools/sansa#mark-duplicates) in a multi-sample VCF file

`sansa compvcf` to [compare multi-sample VCF files](https://github.com/dellytools/sansa#compare-vcfs)

## SV annotation

Download an annotation database. Examples are [gnomAD-SV](https://gnomad.broadinstitute.org/) or [1000 Genomes phase 3](https://www.internationalgenome.org/phase-3-structural-variant-dataset) and then run the annotation.
Expand Down Expand Up @@ -118,10 +120,19 @@ Using [delly](https://github.com/dellytools/delly) and the `INFO/CT` values one

## Mark duplicates

For larger studies that employ single sample calling and then merge SVs across samples a common problem is to identify duplicate SV sites that occur due to SV breakpoint imprecisions. `sansa markdup` identifies duplicates sites based on genomic proximity, genotype concordance and SV allele similarity. By default, duplicate SVs need to have SV breakpoints within 50bp (`-b 50`), a reciprocal overlap of 80% (`-s 0.8`), an SV allele divergence (`-s 0.1`) and an minimum fraction of shared SV carriers of 25% (`-c 0.25`). The SV allele comparison requires [delly's](https://github.com/dellytools/delly) `INFO/CONSENSUS` field as the SV haplotype.
For larger studies that employ single sample calling and then merge SVs across samples a common problem is to identify duplicate SV sites that occur due to SV breakpoint imprecisions. `sansa markdup` identifies duplicates sites based on genomic proximity, genotype concordance and SV allele similarity. By default, duplicate SVs need to have SV breakpoints within 50bp (`-b 50`), a reciprocal overlap of 80% (`-s 0.8`), a maximum SV allele divergence of 10% (`-s 0.1`) and a minimum fraction of shared SV carriers of 25% (`-c 0.25`). The SV allele comparison requires [delly's](https://github.com/dellytools/delly) `INFO/CONSENSUS` field as the SV haplotype.

`sansa markdup -o rmdup.bcf pop.delly.bcf`

## Compare VCFs

Compare an input VCF/BCF file to a ground truth (base) VCF/BCF file.

`sansa compvcf -a base.bcf input.bcf`

To compare SV site lists that lack genotypes, you need to set the minimum allele count to zero (`-e 0`).

`sansa compvcf -a base.bcf -e 0 input.bcf`

## Citation

Expand Down
4 changes: 2 additions & 2 deletions src/compvcf.h
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,7 @@ namespace sansa
// Check command line arguments
if ((vm.count("help")) || (!vm.count("input-file")) || (!vm.count("base"))) {
std::cerr << std::endl;
std::cerr << "Usage: delly " << argv[0] << " [OPTIONS] -a <base.bcf> <input.bcf>" << std::endl;
std::cerr << "Usage: sansa " << argv[0] << " [OPTIONS] -a <base.bcf> <input.bcf>" << std::endl;
std::cerr << visible_options << "\n";
return 0;
}
Expand Down Expand Up @@ -638,7 +638,7 @@ namespace sansa
// Show cmd
boost::posix_time::ptime now = boost::posix_time::second_clock::local_time();
std::cerr << '[' << boost::posix_time::to_simple_string(now) << "] ";
std::cerr << "delly ";
std::cerr << "sansa ";
for(int i=0; i<argc; ++i) { std::cerr << argv[i] << ' '; }
std::cerr << std::endl;

Expand Down
4 changes: 2 additions & 2 deletions src/markdup.h
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ namespace sansa
// Check command line arguments
if ((vm.count("help")) || (!vm.count("input-file"))) {
std::cerr << std::endl;
std::cerr << "Usage: delly " << argv[0] << " [OPTIONS] <input.bcf>" << std::endl;
std::cerr << "Usage: sansa " << argv[0] << " [OPTIONS] <input.bcf>" << std::endl;
std::cerr << visible_options << "\n";
return 0;
}
Expand Down Expand Up @@ -537,7 +537,7 @@ namespace sansa
// Show cmd
boost::posix_time::ptime now = boost::posix_time::second_clock::local_time();
std::cerr << '[' << boost::posix_time::to_simple_string(now) << "] ";
std::cerr << "delly ";
std::cerr << "sansa ";
for(int i=0; i<argc; ++i) { std::cerr << argv[i] << ' '; }
std::cerr << std::endl;

Expand Down

0 comments on commit e252e9e

Please sign in to comment.