A fast tool to find and visualize rearrangements in DNA sequences.
To install Smash++ on various operating systems, follow the instructions below. It requires CMake (>= 3.9) and a C++14 compliant compiler. Note that a precompiled executable is available for 64 bit operating systems in the experiment/bin
directory.
Pull the image by
docker pull smortezah/smashpp
and run it:
docker run -it smortezah/smashpp
Install Miniconda, then run the following:
conda install -c bioconda -y smashpp
- Install Git, CMake and g++:
apt update && apt install -y git cmake g++
- Clone Smash++ and install it:
git clone --depth 1 https://github.com/smortezah/smashpp.git
cd smashpp
bash install.sh
- Install Homebrew, Git and CMake:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install git cmake
- Clone Smash++ and install it:
git clone --depth 1 https://github.com/smortezah/smashpp.git
cd smashpp
bash install.sh
Install WSL (Windows Subsystem for Linux), then clone Smash++ and install it, like in Ubuntu:
git clone --depth 1 https://github.com/smortezah/smashpp.git
cd smashpp
./install.sh
Note: in all operating systems, in the case of permission denial, you can use sudo bash install.sh
instead of ./install.sh
.
./smashpp [OPTIONS] -r <REF-FILE> -t <TAR-FILE>
For example,
./smashpp -r ref -t tar
It is recommended to choose short names for reference and target sequences.
To see the possible options for Smash++, type:
./smashpp
which provides the following:
SYNOPSIS
./smashpp [OPTIONS] -r <REF-FILE> -t <TAR-FILE>
OPTIONS
Required:
-r <FILE> = reference file (Seq/FASTA/FASTQ)
-t <FILE> = target file (Seq/FASTA/FASTQ)
Optional:
-l <INT> = level of compression: [0, 6]. Default -> 3
-m <INT> = min segment size: [1, 4294967295] -> 50
-e <FLOAT> = entropy of 'N's: [0.0, 100.0] -> 2.0
-n <INT> = number of threads: [1, 255] -> 4
-f <INT> = filter size: [1, 4294967295] -> 100
-ft <INT/STRING> = filter type (windowing function): -> hann
{0/rectangular, 1/hamming, 2/hann,
3/blackman, 4/triangular, 5/welch,
6/sine, 7/nuttall}
-fs [S][M][L] = filter scale:
{S/small, M/medium, L/large}
-d <INT> = sampling steps -> 1
-th <FLOAT> = threshold: [0.0, 20.0] -> 1.5
-rb <INT> = ref beginning guard: [-32768, 32767] -> 0
-re <INT> = ref ending guard: [-32768, 32767] -> 0
-tb <INT> = tar beginning guard: [-32768, 32767] -> 0
-te <INT> = tar ending guard: [-32768, 32767] -> 0
-ar = consider asymmetric regions -> no
-nr = do NOT compute self complexity -> no
-sb = save sequence (input: FASTA/FASTQ) -> no
-sp = save profile (*.prf) -> no
-sf = save filtered file (*.fil) -> no
-ss = save segmented files (*.s[i]) -> no
-sa = save profile, filetered and -> no
segmented files
-rm k,[w,d,]ir,a,g/t,ir,a,g:...
-tm k,[w,d,]ir,a,g/t,ir,a,g:...
= parameters of models
<INT> k: context size
<INT> w: width of sketch in log2 form,
e.g., set 10 for w=2^10=1024
<INT> d: depth of sketch
<INT> ir: inverted repeat: {0, 1, 2}
0: regular (not inverted)
1: inverted, solely
2: both regular and inverted
<FLOAT> a: estimator
<FLOAT> g: forgetting factor: [0.0, 1.0)
<INT> t: threshold (no. substitutions)
-ll = list of compression levels
-h = usage guide
-v = more information
--version = show version
AUTHOR
Morteza Hosseini [email protected]
SAMPLE
./smashpp -r ref -t tar -l 0 -m 1000
To see the options for Smash++ Visualizer, type:
./smashpp -viz
which provides the following:
SYNOPSIS
./smashpp -viz [OPTIONS] -o <SVG-FILE> <POS-FILE>
OPTIONS
Required:
<POS-FILE> = position file, generated by
Smash++ tool (*.pos)
Optional:
-o <SVG-FILE> = output image name (*.svg). Default -> map.svg
-rn <STRING> = reference name shown on output. If it
has spaces, use double quotes, e.g.
"Seq label". Default: name in header
of position file
-tn <STRING> = target name shown on output
-l <INT> = type of the link between maps: [1, 6] -> 1
-c <INT> = color mode: [0, 1] -> 0
-p <FLOAT> = opacity: [0.0, 1.0] -> 0.9
-w <INT> = width of the sequence: [8, 100] -> 10
-s <INT> = space between sequences: [5, 200] -> 40
-tc <INT> = total number of colors: [1, 255]
-rt <INT> = reference tick: [1, 4294967295]
-tt <INT> = target tick: [1, 4294967295]
-th [0][1] = tick human readable: 0=false, 1=true -> 1
-m <INT> = minimum block size: [1, 4294967295] -> 1
-vv = vertical view -> no
-nrr = do NOT show relative redundancy -> no
(relative complexity)
-nr = do NOT show redunadancy -> no
-ni = do NOT show inverse maps -> no
-ng = do NOT show regular maps -> no
-n = show 'N' bases -> no
-stat = save stats (*.csv) -> stat.csv
-h = usage guide
-v = more information
--version = show version
AUTHOR
Morteza Hosseini [email protected]
SAMPLE
./smashpp -viz -vv -o simil.svg ref.tar.pos
After installing Smash++, copy its executable file into example
directory and go to that directory:
cp smashpp example/
cd example/
There is in this directory two 1000 base sequences, the reference sequence named ref
, and the target sequence, named tar
. Now, run Smash++ and the visualizer:
./smashpp -r ref -t tar
./smashpp -viz -o example.svg ref.tar.pos
Please cite the following, if you use Smash++:
- M. Hosseini, D. Pratas, B. Morgenstern, A.J. Pinho, "Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements," GigaScience, vol. 9, no. 5, 2020. DOI: 10.1093/gigascience/giaa048
Please let us know if there is any issues.
Copyright © 2018-2021 Morteza Hosseini.
Smash++ is licensed under GNU GPL v3.