Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support needed arguments in pipeline script #16

Open
eweitz opened this issue Aug 11, 2015 · 8 comments
Open

Support needed arguments in pipeline script #16

eweitz opened this issue Aug 11, 2015 · 8 comments

Comments

@eweitz
Copy link
Collaborator

eweitz commented Aug 11, 2015

@dauss75, @vlaufer, @chris-owen, my Django code will need to call the back-end pipeline you developed last week in order to annotate and filter the user's uploaded NGS data file (e.g., an unannotated VCF file). Could you please add support in your pipeline's entry-point script for the following, and provide an example of how to call it from the command line?

The call from my script will include the following arguments:

  • upload_id. A short, random string used downstream to identify the user's upload and results.
  • format. File format defined by the user; either VCF, BAM, or FASTQ. (Assume this is correct for now.)
  • filters. Filters selected by the user prior to uploading. This argument will be used by the script @vlaufer is developing in Write script to filter VCF #8.
  • input_file. Absolute path to the user's uploaded file.

Here's how I imagine I would call the script you have at the beginning of your pipeline:

/path/to/your/script \
--upload_id sazygp \
--format VCF \
--filters  molcons:missense,nonsense+clinsig:pathogenic \
--input_file /home/ubuntu/eweitz/NCBI_August_Hackathon_Push_Button_Genomics_Solution/django/browser/userdata/sazygp/test.GRCh38.dbsnp.clinvar.chr22_dummyNoAnn.vcf
@eweitz eweitz changed the title Provide example call to pipeline script with arguments input_file, filters, upload_id, and format arguments Support needed arguments in pipeline script Aug 11, 2015
@ghost
Copy link

ghost commented Aug 11, 2015

Certainly. I am working on this functionality tonight and should have something soon.

@eweitz eweitz assigned ghost and unassigned dauss75 Aug 11, 2015
@eweitz
Copy link
Collaborator Author

eweitz commented Aug 11, 2015

Awesome! Thanks Vincent.

@ghost
Copy link

ghost commented Aug 11, 2015

All - I have had multiple issues trying to implement this vcf filterer.

  1. I endeavored to switch over from os.system to subprocess.call for security reasons. When I did so, I kept receiving the same error from snpEff. The error was: "error: missing filter expression." I am uncertain as to why the script works with os.system but not when the syntax is changed.
  2. Currently, the script only works with AND statements, not OR statements. I have had a lot of difficulty getting the call to snpeff to work when trying to include more than one filtering expression all in the same call. A stop gap solution is to create temporary files and subset them iteratively, but this will not work for OR statements, if we wish to add that functionality.

Sorry for the slow progress, did not anticipate the amount of difficulty getting this calls to work.

@seandavi
Copy link

@vlaufer, for number 1, you most likely need to specify "shell=True" in the call to subprocess.call(). See here:

https://docs.python.org/3/library/subprocess.html#replacing-os-system

For number 2, perhaps you could drop the current script in a gist so that we can take a look?

@dauss75
Copy link
Contributor

dauss75 commented Aug 11, 2015

I will find time either after work or weekend for any remaining works.

@ghost
Copy link

ghost commented Aug 11, 2015

@seandavi - I tried both shell="True" and shell="False" I think the issue centers on escaping characters correctly and the syntax required by snpSift.

I have been pushing the script to this github repo, it is in the main folder under the name vcf_filter_v0.1.py

@eweitz - I do not think there needs to be an option for delivering FASTQ or BAM to the front end at all, but I may be completely missing something. We currently have no way to annotate anything other than a VCF, and also again unless I am missing something we would need separate FASTQ or BAM --> JSON converters. It seems to me that the pipeline has to go FASTQ-->BAM-->VCF-->JSON to make sense...

@dauss75 - I think there is actually a lot left to do. We need to get the liftover tool in place, so that no matter what comes in, we output the same format of VCF. We will also need to re-download all the SnpSift databases etc. on wherever the next VM is.

@eweitz
Copy link
Collaborator Author

eweitz commented Aug 11, 2015

It seems to me that the pipeline has to go FASTQ-->BAM-->VCF-->JSON to make sense

@vlaufer, agreed -- with the cavaet that users may upload data in FASTQ, BAM or VCF, so we can bypass the FASTQ-to-BAM or BAM-to-VCF conversion in some cases.

The --format argument passed into your back-end pipeline is intended to serve as a way to determine what formatting conversions the back-end will need to do. If --format FASTQ, then convert to BAM if needed, then VCF. If --format BAM, then just convert to VCF. (Issues #3 and #2 cover those conversions.) At the end of this conversion process, the uploaded file would be in VCF regardless of the input format. It would then be annotated, then filtered, via pipeline scripts.

Per your comment, I assume that the annotated and filtered VCF file will then need to be converted to JSON prior to being passed into @cjav's Django and @ohjuarez's Solr services.

The Django view layer gets its data from @ohjuarez's Solr API (see here) formatted as JSON. The JSON is then converted into a Python dictionary, which gets passed onto a Django template that converts the dict to the HTML the user sees.

@ghost
Copy link

ghost commented Aug 11, 2015

@eweitz - good. I think I just misinterpreted which script you were talking about taking those arguments. we are on the same page.

Thanks for bearing with me as I get used to the issue functionality of github.

Kindly, Vincent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants