ProBin

A program for binning metagenomic contigs to taxonomic rank by using nucleotide composition and coverage data in multiple samples.

Install

Clone the repository and execute

cd ProBin/src
pyhon setup.py install

Installs the package probin in default python path, and adds script ProBin.py to bin

Execute ProBin

- f               contains the fasta formatted contigs
- k               the kmer size
- c               number of clusters
- r               the number of runs to execute clustering
- i               the number of maximum iterations per run of clustering
- e               the stop condition for the log differences between iterations in clustering
- a               algorithm to use (em,kmeans)
- mc              what model to use for composition
- cf              file with the coverage data
- model_coverage  what model to use for coverage
- first_data      the header of the first column in the coverage file
- last_data       the header of the last column in the coverage file
- read_length     the length of the reads for coverage calculations
- o               The directory where result files should be stored, otherwise current dir used

Examples of executing ProBin Using composition

ProBin.py bin -f contigs.fna -mc multinomial -k 4 -c 10 -a em -r 10 -i 100 -e 0.001 -o /tmp/results

Using coverage

ProBin.py bin -cf coverage.tsv --model_coverage isotropic_gaussian --first_data 2012-03-25 \
              --last-data 2013-01-18 --read_length 100 -o /tmp/results

Using both composition and coverage

ProBin.py bin -f contigs.fna -mc multinomial -k 4 -c 10 -a em -r 10 -i 100 -e 0.001 \
              -cf coverage.tsv --model_coverage isotropic_gaussian --first_data 2012-03-25 \
              --last-data 2013-01-18 --read_length 100 -o /tmp/results

Dependencies

Developed under python 2.7 and following packages installed through pip:

Jinja2==2.6
Pygments==1.6
argparse==1.2.1
biopython==1.61
distribute==0.6.35
docutils==0.10
ipython==0.13.2
line-profiler==1.0b3
matplotlib==1.2.1
nose==1.3.0
numpy==1.7.1
openpyxl==1.6.2
pandas==0.11.0
python-dateutil==2.1
pytz==2013b
pyzmq==13.1.0
scipy==0.12.0
six==1.3.0
stevedore==0.8
tornado==3.0.1
virtualenv==1.9.1
virtualenv-clone==0.2.4
virtualenvwrapper==3.7
wsgiref==0.1.2

TODO

Use single model attribute choice (merge mc and model-coverage)
Have the input for multinomial as parameter rather than required (might only want coverage)
Standardized format for the coverage input file and the first,last data parsing.
Make the preprocessing generate the standardized coverage input file

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
data		data
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProBin

Install

Execute ProBin

Dependencies

TODO

About

Releases

Packages

Contributors 2

Languages

BinPro/ProBin

Folders and files

Latest commit

History

Repository files navigation

ProBin

Install

Execute ProBin

Dependencies

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages