-
Notifications
You must be signed in to change notification settings - Fork 2
4. Configuration Files for Workflows
If you don't want to create a whole new workflow, but merely tweak parameters in established workflows, you've come to the right page. The easiest way to do this is to use general and sample-specific config files.
For information on which parameters of workflows are adaptable, please refer to the established workflows page.
Here I will give an overview of how the configurable parameters for the basic workflow can be adapted by users without modifying the actual workflow.
All of these adjustable options in the workflow are corresponding to the Centrifuge module.
Here is how a user could provide a general config file, not sample-specific, passed to sheppard.py via the -a/--config argument
:
parameter-identifier=parameter-value
And so for instance, a file with such configurations could look like:
centrifuge_index = /path/to/centrifuge-index/
centrifuge_timelimit = 00:05:00
centrifuge_memory = 24
centrifuge_threads = 1
And here is how one can set specific settings for each sample, not that you would want to for the basic workflow, but you could:
sample_id <tab> parameter-identifier
sample-identifier <tab> sample-specific-value
The first line acts as a header which lists the parameters you are hoping to adapt. The first column of this line should be 'sample_id'.
Here is an example sample-specific configuration file which you can pass to sheppard.py using -s/--sample_config
:
sample_id centrifuge_memory centrifuge_threads
SampleA 24 1
SampleB 12 2
SampleC 8 3
So with such a sample-specific configuration provided, sheppard will run all three samples through the same workflow, in this case basic, but with different parameter configurations for the two adjusted parameters. SampleA will be run with 1 core and 24Gb per core, while SampleB will be run with 2 cores and 12Gb per core, and so on.
Wait, that's not all, you can also use multiple parameter-sets for the same sample! That's right folks, you can use it for pain-free, though extremely computationally unfriendly, benchmarking. To see this functionality start to be useful, imagine we were interested in Centrifuge results after subsampling different depths of reads (e.g. 10K reads, 100K reads, 1M reads ...). This can be done with the fast_basic workflow for instance, where there is a configurable parameter called read_subsampling in the workflow. Here is how our sample configuration file might look:
sample_id read_subsampling
SampleA 10000
SampleA 100000
SampleA 1000000