=============================== NG-Omics-WF ================================== Workflow tools for next generation genomics, metagenomics, RNA-seq and other type of omic data analysis Software originally developed since 2010 by Weizhong Li at UCSD currently at JCVI https://github.com/weizhongli/ngomicswf [email protected] ==============================================================================
NG-Omics-WF is a workflow tool to automatically run a bioinformatic pipeline for multiple datasets under generic Linux computer or Linux cluster environment. The tool was oritinally implemented since 2010 with supports from the Human Microbiome Project (HMP) and CAMERA project at UCSD for the analysis of next generation metagenomic sequence data. The tool was further improved at UCSD and then at JCVI as the backend workflow engine for many other projects for the analysis of metagenomic data, RNA-seq data, genomic data and other omic data.
The tool was originally written in Perl, it was re-written in Python 2.7 and in Python 3. The Perl and Python 2.7 version were no longer supported (though they still work) since 2019. Currently, Python 3.6 or later version is needed to run the tool. This program need to run under generic Linux system, either on a standalone computer or a computer cluster that support queue system, such as open grid engine (formally sun grid engine, SGE).
Directory workflow-examples has several workflow examples, which can be directly used or after some re-configuration.
The detailed documents are available at https://github.com/weizhongli/ngomicswf/wiki. below is a brief user's guide:
NG-Omics-WF works on any generic Linux computer or Linux cluster environment. Any common Linux systems, such as Ubuntu, CentOS, Fedora, Debian, are good. The hardware requirements depend on the size and analysis type of input datasets. For example, for a typical metagenomic datasets with several GB of sequence data / sample, computers with >=64GB RAM and >=32 cores are recommanded.
Python 3.6 or higher is required to run NG-Omics-WF. But many other software tools are needed for the the analysis in the individual steps in the workflow. The required tools depends on the type of input data and need of the analysis. Here are some examples:
Required for most analysis
- Trimmomatic - A flexible read trimming tool for Illumina NGS data, link
- BWA - Burrow-Wheeler Aligner for short-read alignment, link
- samtools - Tools for manipulating next-generation sequencing data, link
- Centrifuge - Classifier for metagenomic sequences, link
- SPAdes - metaSPAdes: a new versatile de novo metagenomics assembler, link
- Prodigal - Gene Prediction Software, link
- cd-hit - sequence clustering, link
- minimap2 - a versatile pairwise aligner for genomic and spliced nucleotide sequences, link
- NCBI-blast+, link
Optional
- Hmmer3 - Biological sequence analysis using profile hidden Markov models, link
- RGI - Resistance Gene Identifier, link
Project-specific documents, including workflow configration for that project, are available under each projects:
- Antibiotics alter human gut microbiome resistome and colonization resistance ...
- Vertical transmission of gut microbiome from mother to baby ...
- Microbial species in gut can stay lifetime
DOI:10.1101/475699, https://doi.org/10.1101/475699, https://www.biorxiv.org/content/10.1101/475699v1