-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis Variable #4
Comments
Here is more detail on what I've done so far (I'm parsing the results from this now) https://github.com/sci-f/snakemake.scif/tree/add/races/results/cloud |
Hey @vsoch , you can use Snakemake to vary variables in the workflow. Actually I think that snakemake is not worse than other software for that purpose. Maybe you read this
For bcftools_call respectively
You could add some of these variables into the snakemake workflow and create config files with different variable settings. Then you can specify which variables to use when running snakemake with the --configfile FILE option. Since the data in this repo is just for testing purpose, I don't know if you'll see big changes in the result if you try other variables. |
Okay, so reading the docs I think we want to take the following approach:
Then I assume we would want to look at the all.vcf file? Or are we still interested in memory and time? Given that we find some different in result or runtime metric, is our evaluation then that "the fastest" or "least memory required" is really associated with best? In other words, if we were running this grid of metrics for a researcher, what kind of advice would we give him after doing it.
Do you mean to say that you don't think doing the variation will have much influence? I think Snakemake definitely fits the bill for running the kind of comparison we want to do, and (the much harder part, for me at least) is deciding (in advance) if there is some variation (in what?) how do we evaluate it's goodness. |
The other approach (when talking about variables) that is interesting would be to show how a single library / software changes over time (calling the same function) or doesn't. |
There are also easy ways to do this with continuous integration, e.g., using a grid in travis (see example --> https://github.com/pydicom/pydicom/blob/master/.travis.yml) but there it's harder to have control of the results. |
ah and here is an example for travis-izing circle! https://github.com/michaelcontento/circleci-matrix |
No. You want to get meaningful results for your scientific problem and the runtime or memory consumption is secondary. The choice of the parameters is very situation-dependent and up to the researcher. Time and memory consumption is interesting if you compare two algorithms with comparable input parameters.
I think it will influence the number of variants found. I just don't know how to interpret the changes since I'm not an expert in this domain.
That is really an interesting idea. I don't know if there are good studies about that for popular software.
I never used continuous integration, but I have to keep the circleci thing in my mind. It looks very convenient. |
hey @fbartusch, Another question for you! I've created some cloud builders that can be launched to run snakemake on Google Cloud (compute) and I tested the valgrind (memory) analysis across about 16 different instance types. Since it's tiny (so far) the memory doesn't seem to make a difference. What I think I'd want to do (which would be useful for HPC) is to vary some variable set by the scientist to then assess how results are influenced. Is snakemake a bad contender for that? If so, what other things could we vary that would be useful / interesting?
The text was updated successfully, but these errors were encountered: