Download the test BAM file from https://www.internationalgenome.org/
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
Processing an 841Mb BAM file.
Check output:
./bin/sambamba-1.0.0 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |md5sum sambamba 1.0.0 3cc1fcc85b0e5ab516784ffbbc9c347c
Sorted
02b42b6a7e5f8654a1bbc3ee1cf15a8d HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam ec5b9ba95b53f0d10c7949a933e65ac7 dedup.bam
time ./bin/sambamba-1.0.0 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null sambamba 1.0.0 by Artem Tarasov and Pjotr Prins (C) 2012-2022 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) real 0m1.435s user 0m14.648s sys 0m0.336s time ./bin/sambamba-0.8.2-linux-amd64-static view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null sambamba 0.8.2 by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1) real 0m1.393s user 0m14.114s sys 0m0.366s time ./bin/sambamba-0.8.1-pre1 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null sambamba 0.8.1-pre1 by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6) real 0m1.398s user 0m16.589s sys 0m0.240s --- sort time ./bin/sambamba-1.0.0 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam sambamba 1.0.1 by Artem Tarasov and Pjotr Prins (C) 2012-2023 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) real 0m9.599s user 2m1.039s sys 0m3.834s sambamba 1.0.0 by Artem Tarasov and Pjotr Prins (C) 2012-2022 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) real 0m9.769s user 2m1.187s sys 0m3.830s sambamba 0.8.2 by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1) real 0m9.472s user 1m58.529s sys 0m3.850s time ./bin/sambamba-0.8.1-pre1 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam sambamba 0.8.1-pre1 by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6) real 0m9.151s user 2m5.779s sys 0m3.101s time ./bin/sambamba-0.6.8-linux-static sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 0m10.213s user 2m6.739s sys 0m3.425s --- markdup time ./bin/sambamba-1.0.0 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam sambamba 1.0.0 by Artem Tarasov and Pjotr Prins (C) 2012-2022 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) real 0m10.831s user 1m41.497s sys 0m3.413s time ./bin/sambamba-0.8.2-linux-amd64-static markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam sambamba 0.8.2 by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1) real 0m10.315s user 1m38.043s sys 0m3.292s time ./bin/sambamba-0.8.1-pre1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam by Artem Tarasov and Pjotr Prins (C) 2012-2021 LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6) real 0m11.319s user 1m47.719s sys 0m4.070s
monza:~/tmp$ time ./sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null sambamba 0.6.8-pre3 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6) real 0m6.930s user 0m26.940s sys 0m0.516s sambamba 0.6.8-pre2 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 0m6.854s user 0m26.456s sys 0m0.584s linux-vdso.so.1 (0x00007ffd227fc000) librt.so.1 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/librt.so.1 (0x00007f5d31082000) libpthread.so.0 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libpthread.so.0 (0x00007f5d30e64000) libm.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libm.so.6 (0x00007f5d30b52000) libdl.so.2 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libdl.so.2 (0x00007f5d3094e000) libgcc_s.so.1 => /gnu/store/h3z6nshhdlc8zgh4mi13x1br03xipi9r-gcc-7.2.0-lib/lib/libgcc_s.so.1 (0x00007f5d30737000) libc.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libc.so.6 (0x00007f5d30398000) /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/ld-linux-x86-64.so.2 (0x00007f5d3128a000)
time ./build/sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 0m2.869s user 0m21.972s sys 0m0.356s
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) real 0m3.150s user 0m24.668s sys 0m0.320s This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 0m2.869s user 0m22.344s sys 0m0.344s
time ./sambamba_v0.6.6 sort -m 20615843020 -N -o /dev/null ENCFF696RLQ.bam -p
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 7m50.558s user 89m10.808s sys 2m57.188s
and with 120GB RAM
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) real 3m49.953s user 81m16.956s sys 1m58.332s
Wed Feb 7 03:43:14 CST 2018 sambamba 0.6.8-pre1 This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 8m0.528s user 88m44.084s sys 2m45.888s
When sambamba is given enough RAM to hold everything in memory sambamba is twice as fast (apparently half the time goes to intermediate IO)
time ./sambamba sort -N -o /dev/null ENCFF696RLQ.bam -p -m 120G
real 3m46.856s user 81m44.524s sys 1m56.388s
with 64GB it is
real 5m36.062s user 88m43.176s sys 3m0.536s
and with 32GB it is
real 7m22.125s user 89m6.188s sys 2m51.228s
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 18m15.809s user 158m30.148s sys 3m15.932s
Ouch! A regression in the shipped release 0.6.7.
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) ldc2 -wi -I. -IBioD -IundeaD/src -g -O3 -release -enable-inlining -boundscheck=off real 18m40.223s user 159m34.292s sys 3m19.300s
So, the same build is 2x slower than the previous version.
This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) Using ldmd2 @sambamba-ldmd-release.rsp "-g" "-O2" "-c" "-m64" "-release" "-IBioD/" "-IundeaD/src/" "-ofbuild/sambamba.o" "-odbuild" "-I." gcc -Wl,--gc-sections -o build/sambamba build/sambamba.o -Lhtslib -Llz4/lib -Wl,-Bstatic -lhts -llz4 -Wl,-Bdynamic /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libphobos2-ldc.a /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libdruntime-ldc.a -lrt -lpthread -lm real 9m9.465s user 97m56.204s sys 2m50.512s
Updated the makefile to build with -singleobj. Now LLVM kicks in!
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 8m1.978s user 89m13.936s sys 2m47.392s
Next I tried adding profile guided optimization. But that turned out to be slower
This version was built with: LDC 1.7.0 using DMD v2.077.1 using LLVM 5.0.1 bootstrapped with LDC - the LLVM D compiler (1.7.0) real 11m16.267s user 116m15.556s sys 2m56.244s
So, the release is reverted an after a version bump:
This version was built with: LDC 0.17.1 using DMD v2.068.2 using LLVM 3.8.0 bootstrapped with version not available real 10m0.932s user 151m39.172s sys 3m7.596s This version was built with: LDC 1.1.1 using DMD v2.071.2 using LLVM 3.9.1 bootstrapped with LDC - the LLVM D compiler (1.1.1) real 9m22.501s user 98m24.748s sys 2m51.996s
Note, updating compiler shows a speed gain for 0.6.6.
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4) finding positions of the duplicate reads in the file... sorted 11286293 end pairs and 156042 single ends (among them 0 unmatched pairs) collecting indices of duplicate reads... done in 1325 ms found 6603388 duplicates collected list of positions in 0 min 16 sec marking duplicates... collected list of positions in 1 min 2 sec Command being timed: "./bin/sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam" User time (seconds): 406.49 System time (seconds): 3.86 Percent of CPU this job got: 649% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.13 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1709720 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1140382 Voluntary context switches: 393213 Involuntary context switches: 8993 Swaps: 0 File system inputs: 0 File system outputs: 2663824 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Uses slightly more memory but is faster than
/usr/bin/time --verbose sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam finding positions of the duplicate reads in the file... sorted 11286293 end pairs and 156042 single ends (among them 0 unmatched pairs) collecting indices of duplicate reads... done in 1521 ms found 6603388 duplicates collected list of positions in 0 min 16 sec marking duplicates... total time elapsed: 1 min 4 sec Command being timed: "sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam" User time (seconds): 423.78 System time (seconds): 4.47 Percent of CPU this job got: 666% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04.24 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1542764 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 1839470 Voluntary context switches: 368082 Involuntary context switches: 8537 Swaps: 0 File system inputs: 0 File system outputs: 2643840 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
/usr/bin/time --verbose ./bin/sambamba-0.7.1 "--DRT-gcopt=profile:1" markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam
Commit 5f52f04aae3de1dce2d13b9e748002b4e513ded0
by Artem Tarasov and Pjotr Prins (C) 2012-2019 LDC 1.17.0 / DMD v2.087.1 / LLVM8.0.1 / bootstrap LDC - the LLVM D compiler (1.17.0) finding positions of the duplicate reads in the file... sorted 3969781 end pairs and 73839 single ends (among them 22397 unmatched pairs) collecting indices of duplicate reads... done in 372 ms found 239673 duplicates collected list of positions in 0 min 6 sec marking duplicates... collected list of positions in 0 min 22 sec Number of collections: 107 Total GC prep time: 10 milliseconds Total mark time: 548 milliseconds Total sweep time: 26 milliseconds Max Pause Time: 10 milliseconds Grand total GC time: 585 milliseconds GC summary: 1158 MB, 107 GC 585 ms, Pauses 558 ms < 10 ms Command being timed: "./bin/sambamba-0.7.1 --DRT-gcopt=profile:1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test2.bam" User time (seconds): 136.00 System time (seconds): 2.39 Percent of CPU this job got: 583% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.70 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1282940 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 396600 Voluntary context switches: 199806 Involuntary context switches: 5017 Swaps: 0 File system inputs: 16 File system outputs: 1967376 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0