Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF output #36

Open
fgvieira opened this issue Oct 19, 2020 · 9 comments
Open

VCF output #36

fgvieira opened this issue Oct 19, 2020 · 9 comments

Comments

@fgvieira
Copy link

Dear all,

is it possible to convert ExomeDepth output to a VCF?
I looked in the manual, but there is not reference to a VCF...

thanks,

@vplagnol
Copy link
Owner

I have not implemented that, no. It does seem like a fairly manageable task I presume, probably cleaner than csv. Any favourite R VCF writer I should be using?

@fgvieira
Copy link
Author

Not really... once used vcfR and it was fine but no real strong opinion.

@fgvieira
Copy link
Author

Any idea of when it would be available?
thanks,

@jamigo
Copy link

jamigo commented Feb 4, 2021

It would be great to have it implemented inside ExomeDepth's R code, but here's what we're using meanwhile:

REFERENCE=/path/to/reference.fa
for FILE in $FILENAME*cnv*tab; do
 SAMPLE=${FILE/.cnv.*}
 if [[ $FILE == *".cnv.X."* ]] && [ -f ${FILE/.cnv.X./.cnv.} ]; then HEADER=; else HEADER=1; fi
 if [ $HEADER ]; then echo '##fileformat=VCFv4.2
##source=ExomeDepth
##reference='$REFERENCE'
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=BF,Number=1,Type=Float,Description="Bayes Factor">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=EXONS,Number=.,Type=Integer,Description="Number of exons affected">
##INFO=<ID=READSEXP,Number=.,Type=Integer,Description="Number of reads expected">
##INFO=<ID=READSOBS,Number=.,Type=Integer,Description="Number of reads observed">
##INFO=<ID=READSRATIO,Number=.,Type=Integer,Description="Ratio expected vs. observed">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	'$SAMPLE > $SAMPLE.ExomeDepth.vcf
 fi
 tail -n +2 $FILE | sed 's/"//g' | perl -lane '
if ($F[2] =~ /dup/) { $F[2] = "DUP" } elsif ($F[2] =~ /del/) { $F[2] = "DEL" } else { $F[2] = "UNK" };
if ($F[11] > 0.5 && $F[11] < 1.5) { $gt = "0/1" } else { $gt = "1/1" }
$info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]";
$F[8] = 0 if $F[8] < 0; print join "\t", $F[6], $F[4], ".", "N", "<$F[2]>", $F[8], "PASS", $info, "GT", $gt
' >> $SAMPLE.ExomeDepth.vcf
done

@halessi
Copy link

halessi commented Feb 9, 2021

A vcf output to facilitate annotation by something like AnnotSV would be spectacular.

@jamigo, how does one use this?

edit: still confused. have tried calling your bash script after defining $FILENAME as the .csv file, but that doesn't seem appropriate and doesn't work.

@jamigo
Copy link

jamigo commented Feb 9, 2021

We internally name all ExomeDepth's output file as sample.cnv.tab and sample.cnv.X.tab (this is not mandatory, although if your output files are .csv you should modify the script accordingly), and this script merges both .tab pairs into one single .vcf file

@halessi
Copy link

halessi commented Feb 9, 2021

@jamigo thank you so much. I am trying to use your script after reformatting the csvs as .tab, however I am getting this error:

syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors. syntax error at -e line 4, near "1)" Execution of -e aborted due to compilation errors.

Your help would be greatly appreciated thank you so much. I am just trying to do it exactly as you did -- my capabilities with bash are pretty limited.

EDIT: fixed it. I think the script you posted here may be missing a parenthesis @ SVLEN=".$F[5]-$F[4]+1)."

(it worked as SVLEN="(.$F[5]-$F[4]+1).")

@jamigo
Copy link

jamigo commented Feb 9, 2021

That line where the INFO column is defined should in fact be the following:
$info = "END=$F[5];SVLEN=".($F[5]-$F[4]+1).";EXONS=$F[3];READSEXP=$F[9];READSOBS=$F[10];READSRATIO=$F[11];BF=$F[8];SVTYPE=$F[2]";
I don't know why that left parenthesis was lost, but it should be right after the dot rather than before.

@kasia-te
Copy link

kasia-te commented Mar 18, 2022

Hi,

I am trying to convert the ExomeDepth result ([email protected]) to VCF inside my R script.
Could you tell me if the variant coordinates are 0 or 1-based?
If I understand the above bash script, it assumes that variant start (s) and end (e) positions are 0-based and inclusive, (like on the picture). Is it right?
- - - + + + + + + - - -
- - - s . . . . e - - -

Best,
Kasia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants