Skip to content

CompEpigen/PipelineOlympics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline Olympics: continuous benchmarking of computational workflows for DNA methylation sequencing data

Analysis of bisulfite sequencing data relies upon processing that generally includes four core steps: read preprocessing, alignment, post-alignment processing and calling of methylation states. An impressive number of tools for each of the steps or their combinations, workflows integrating them as well as turn-key solutions have been proposed. Despite of this versatility, so far only few attempts have been made to systematically evaluate complete processing workflows in a standardized and unbiased analysis. Previous benchmarks either focused upon a single processing task, e.g. predominantly alignment software. None of the previous benchmarks covered the complete data processing workflow and was based on a reasonable gold-standard data set.

To bridge this gap, we set out to perform a thorough benchmarking study of bisulfite sequencing workflows. At the core of our benchmark is a set of samples with highly accurate methylation calls Bock, Halbritter et al. 2016, which we use as the gold-standard. We evaluate the software in the context of five most widely used sequencing protocols (two variants of standard whole genome bisulfite sequencing, tagmentation-based WGBS, PBAT and EMSeq) and propose protocol-specific choice of workflows. To simplify the choice of workflows and enable continuity we developed rich data presentation and benchmarking resources (see below). To our knowledge, this is the most comprehensive benchmarking study of processing workflows for DNA methylation sequencing data to date.

Benchmarked workflows

Current study included 10 previously published workflows.

  1. BAT
  2. Biscuit
  3. Bismark
  4. BSBolt
  5. bwa-meth with MethylDackel
  6. FAME
  7. gemBS
  8. GSNAP with BisSNP
  9. methylCtools
  10. methylpy

The benchmarking implementation and resources enable semless extension of the list of current workflows to include new software.

Pipeline Olympics resources

  • Automated benchmarking portal based on workflUX

https://epigenomics.dkfz.de/PipelineOlympics/workflux

  • Interactive web-interface with benchmarking metrics

https://epigenomics.dkfz.de/PipelineOlympics/shiny