Support needed arguments in pipeline script #16

eweitz · 2015-08-11T04:06:18Z

@dauss75, @vlaufer, @chris-owen, my Django code will need to call the back-end pipeline you developed last week in order to annotate and filter the user's uploaded NGS data file (e.g., an unannotated VCF file). Could you please add support in your pipeline's entry-point script for the following, and provide an example of how to call it from the command line?

The call from my script will include the following arguments:

upload_id. A short, random string used downstream to identify the user's upload and results.
format. File format defined by the user; either VCF, BAM, or FASTQ. (Assume this is correct for now.)
filters. Filters selected by the user prior to uploading. This argument will be used by the script @vlaufer is developing in Write script to filter VCF #8.
input_file. Absolute path to the user's uploaded file.

Here's how I imagine I would call the script you have at the beginning of your pipeline:

/path/to/your/script \
--upload_id sazygp \
--format VCF \
--filters  molcons:missense,nonsense+clinsig:pathogenic \
--input_file /home/ubuntu/eweitz/NCBI_August_Hackathon_Push_Button_Genomics_Solution/django/browser/userdata/sazygp/test.GRCh38.dbsnp.clinvar.chr22_dummyNoAnn.vcf

The text was updated successfully, but these errors were encountered:

ghost · 2015-08-11T04:10:48Z

Certainly. I am working on this functionality tonight and should have something soon.

eweitz · 2015-08-11T04:12:31Z

Awesome! Thanks Vincent.

ghost · 2015-08-11T06:13:24Z

All - I have had multiple issues trying to implement this vcf filterer.

I endeavored to switch over from os.system to subprocess.call for security reasons. When I did so, I kept receiving the same error from snpEff. The error was: "error: missing filter expression." I am uncertain as to why the script works with os.system but not when the syntax is changed.
Currently, the script only works with AND statements, not OR statements. I have had a lot of difficulty getting the call to snpeff to work when trying to include more than one filtering expression all in the same call. A stop gap solution is to create temporary files and subset them iteratively, but this will not work for OR statements, if we wish to add that functionality.

Sorry for the slow progress, did not anticipate the amount of difficulty getting this calls to work.

seandavi · 2015-08-11T11:26:33Z

@vlaufer, for number 1, you most likely need to specify "shell=True" in the call to subprocess.call(). See here:

https://docs.python.org/3/library/subprocess.html#replacing-os-system

For number 2, perhaps you could drop the current script in a gist so that we can take a look?

dauss75 · 2015-08-11T14:17:12Z

I will find time either after work or weekend for any remaining works.

ghost · 2015-08-11T18:09:08Z

@seandavi - I tried both shell="True" and shell="False" I think the issue centers on escaping characters correctly and the syntax required by snpSift.

I have been pushing the script to this github repo, it is in the main folder under the name vcf_filter_v0.1.py

@eweitz - I do not think there needs to be an option for delivering FASTQ or BAM to the front end at all, but I may be completely missing something. We currently have no way to annotate anything other than a VCF, and also again unless I am missing something we would need separate FASTQ or BAM --> JSON converters. It seems to me that the pipeline has to go FASTQ-->BAM-->VCF-->JSON to make sense...

@dauss75 - I think there is actually a lot left to do. We need to get the liftover tool in place, so that no matter what comes in, we output the same format of VCF. We will also need to re-download all the SnpSift databases etc. on wherever the next VM is.

eweitz · 2015-08-11T18:57:33Z

It seems to me that the pipeline has to go FASTQ-->BAM-->VCF-->JSON to make sense

@vlaufer, agreed -- with the cavaet that users may upload data in FASTQ, BAM or VCF, so we can bypass the FASTQ-to-BAM or BAM-to-VCF conversion in some cases.

The --format argument passed into your back-end pipeline is intended to serve as a way to determine what formatting conversions the back-end will need to do. If --format FASTQ, then convert to BAM if needed, then VCF. If --format BAM, then just convert to VCF. (Issues #3 and #2 cover those conversions.) At the end of this conversion process, the uploaded file would be in VCF regardless of the input format. It would then be annotated, then filtered, via pipeline scripts.

Per your comment, I assume that the annotated and filtered VCF file will then need to be converted to JSON prior to being passed into @cjav's Django and @ohjuarez's Solr services.

The Django view layer gets its data from @ohjuarez's Solr API (see here) formatted as JSON. The JSON is then converted into a Python dictionary, which gets passed onto a Django template that converts the dict to the HTML the user sees.

ghost · 2015-08-11T19:02:28Z

@eweitz - good. I think I just misinterpreted which script you were talking about taking those arguments. we are on the same page.

Thanks for bearing with me as I get used to the issue functionality of github.

Kindly, Vincent

eweitz changed the title ~~Provide example call to pipeline script with arguments input_file, filters, upload_id, and format arguments~~ Support needed arguments in pipeline script Aug 11, 2015

eweitz assigned dauss75 Aug 11, 2015

eweitz mentioned this issue Aug 11, 2015

Write script to filter VCF #8

Open

eweitz assigned ghost and unassigned dauss75 Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support needed arguments in pipeline script #16

Support needed arguments in pipeline script #16

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015

seandavi commented Aug 11, 2015

dauss75 commented Aug 11, 2015

ghost commented Aug 11, 2015

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015

Support needed arguments in pipeline script #16

Support needed arguments in pipeline script #16

Comments

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015

seandavi commented Aug 11, 2015

dauss75 commented Aug 11, 2015

ghost commented Aug 11, 2015

eweitz commented Aug 11, 2015

ghost commented Aug 11, 2015