CloudConductor is a cloud-based workflow engine for defining and executing bioinformatics pipelines in a cloud environment. Currently, the framework has been tested extensively on the Google Cloud Platform, but will eventually support other platforms including AWS, Azure, etc.
- User-friendly
- Define complex workflows by linking together user-defined modules that can be re-used across pipelines
- Config_obj for clean, readable workflows (see below example)
- 50+ pre-installed modules for existing bioinformatics tools
- Portable
- Docker integration ensures reproducible runtime environment for modules
- Platform independent (currently supports GCP; AWS, Azure to come)
- Modular/Extensible
- Plug-N-Play with user-defined task modules
- Easily re-use, re-combine across workflows
- Eliminates serial copy/paste
- Easily add or customize task modules as needed
- Pre-Launch Type-Checking
- Strongly-typed task modules
- Catch pipeline errors prior to runtime
- Pre-launch validation ensures pipeline success/failure
- Strongly-typed task modules
- Scalable
- Removes resource limitations imposed by cluster-based HPCCs
- Elastic
- VM usage automatically scales to match input file sizes, computational needs
- Scatter-Gather Parallelism
- In-built logic for dividing large tasks into small chunks and re-combining
- Economical
- Preemptible/Spot instances drastically cut workflow costs
CloudConductor is currently designed only for Linux systems. You will need to install and configure the following tools to run your pipelines on Google Cloud:
-
Python v3.6+
You can check your Python version by running the following command in your terminal:
$ python3 -V Python 3.6.8
To install the correct version of Python, visit the official Python website.
-
Python packages: configobj, jsonschema, requests
You will need pip to install the above packages. After installing pip, run the following commands in your terminal:
# Upgrade pip sudo pip3 install -U pip # Install Python modules sudo pip3 install -U configobj jsonschema requests
-
Clone the CloudConductor repo
# clone the repo git clone https://github.com/labdave/CloudConductor.git
-
Follow the instructions on the official Google Cloud website.
Get started with our full documentation to explore the ways CloudConductor can streamline the development and execution of complex, multi-sample workflows typical in bioinformatics.
CloudConductor is actively under development. To get involved or request features, please contact Razvan Panea.