Skip to content

nlin0907/CNV-Annotator

Repository files navigation

CNV Annotator

CNV Annotator is a compilation of programs and scripts used to annotate CNVs from CoNIFER calls.txt output

Usage

Steps:

  1. Set up directory and install BEDOPS
mkdir annotator

This directory should contain biomart_analysis.R, annotate.sh, calls_txt_to_bed.py, final_step.py, get_sampleID.py, gff_parsing.py, replace_absolute_value.py, and replaceunderscore.py

  1. Within directory, for GFF3 file in HG38:
 foo@bar:~$ wget http://dgv.tcag.ca/dgv/docs/DGV.GS.hg38.gff3

for GFF3 file in HG37:

 foo@bar:~$ wget http://dgv.tcag.ca/dgv/docs/DGV.GS.March2016.50percent.GainLossSep.Final.hg19.gff3
  1. Make sub-directories in the annotator directory for different analyses (i.e. analysis1, analysis2) and place calls.txt from CoNIFER in respective sub-directories
  2. Run biomart_analysis.R on each calls.txt for gene, phenotype description, and MIM morbid description and place ref-biomart.txt (biomart_analysis.R output file) in respective sub-directories. Depending on how your calls.txt file is formatted, you might have to change chrom = line_elements[2] to chrom = line_elements[1], start_in = line_elements[3] to start_in = line_elements[2], and stop_in = line_elements[3] to stop_in = line_elements[2] IMPORTANT NOTE: To use HG38 ensembl genes, replace the following line
ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="grch37.ensembl.org", path="/biomart/martservice")

with

ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="www.ensembl.org")
  1. Modify annotate.sh: Replace lines 3 and 4
SDIR=/Users/nicolelin/annotator/analysis3
DIR=/Users/nicolelin/annotator

with your own paths. SDIR is path to sub-directory (aka your different analyses). DIR is path to annotator directory.

If your calls.txt file is not in the format of sampleID, chromosome, start, end, type and has extra columns, replace the number on line 8 with the number of extra columns you have. i.e. If you have a calls.txt file with the columns sampleID, chromosome, start, end, type, variant_type, unique, ID, then change

python $DIR/calls_txt_to_bed.py $SDIR/calls.txt $SDIR/calls.bed 15

to

python $DIR/calls_txt_to_bed.py $SDIR/calls.txt $SDIR/calls.bed 3

If you want to use HG38 GFF3 file, replace lines 20, 21, and 26 with

python $DIR/gff_parsing.py $DIR/DGV.GS.hg38.gff3 $DIR/DGVHG38.bed
sort-bed $DIR/DGVHG38.bed > $DIR/sorted_DGVHG38.bed

bedmap --echo --echo-map-id-uniq --fraction-both 0.75 $SDIR/anno_calls3.bed $DIR/sorted_DGVHG38.bed > $SDIR/annotated_calls.bed

Now, you're ready to activate the bash script.

  1. Give permissions to bash script:
foo@bar:~$ chmod u+x annotate.sh
  1. Run annotate.sh:
./annotate.sh
  1. The final calls should be located in your sub-directory called FINAL_ANNOTATED_CALLS.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published