v0.7
Major update in the implemented parallelization. The new parallel implementation allows a much more efficient interplay with reading input -> aligning -> writing output
. This results in much better CPU usage as the number of threads increases. For example, I observed an almost a 2x speedup (50-30% reduced runtime) across four larger datasets when using 16 cores (SIM and GIAB 150bp and 250bp reads, see README benchmarks).
For reference, previous naive parallelization ran in sequential order: 1. Read batch of reads with one thread 2. Align batch input in parallel with OpenMP 3. Write output with one thread. New parallelization performs 1-3 across threads with mutex on input and output. Such types of parallelization are commonly applied in other tools.
This release also includes:
- Implemented automatic inference of read length, which removes the need of specifying
-r
(as reported in #19) - Some minor bugfixes. For example, this bug is fixed.
This release has identical or near-identical alignments to the previous version v0.6.1 (same accuracy and SV calling stats across tested datasets)