CNV Annotator

CNV Annotator is a compilation of programs and scripts used to annotate CNVs from CoNIFER calls.txt output

Usage

Steps:

Set up directory and install BEDOPS

mkdir annotator

This directory should contain biomart_analysis.R, annotate.sh, calls_txt_to_bed.py, final_step.py, get_sampleID.py, gff_parsing.py, replace_absolute_value.py, and replaceunderscore.py

Within directory, for GFF3 file in HG38:

 foo@bar:~$ wget http://dgv.tcag.ca/dgv/docs/DGV.GS.hg38.gff3

for GFF3 file in HG37:

 foo@bar:~$ wget http://dgv.tcag.ca/dgv/docs/DGV.GS.March2016.50percent.GainLossSep.Final.hg19.gff3

Make sub-directories in the annotator directory for different analyses (i.e. analysis1, analysis2) and place calls.txt from CoNIFER in respective sub-directories
Run biomart_analysis.R on each calls.txt for gene, phenotype description, and MIM morbid description and place ref-biomart.txt (biomart_analysis.R output file) in respective sub-directories. Depending on how your calls.txt file is formatted, you might have to change chrom = line_elements[2] to chrom = line_elements[1], start_in = line_elements[3] to start_in = line_elements[2], and stop_in = line_elements[3] to stop_in = line_elements[2] IMPORTANT NOTE: To use HG38 ensembl genes, replace the following line

ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="grch37.ensembl.org", path="/biomart/martservice")

with

ensembl = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="www.ensembl.org")

Modify annotate.sh: Replace lines 3 and 4

SDIR=/Users/nicolelin/annotator/analysis3
DIR=/Users/nicolelin/annotator

with your own paths. SDIR is path to sub-directory (aka your different analyses). DIR is path to annotator directory.

If your calls.txt file is not in the format of sampleID, chromosome, start, end, type and has extra columns, replace the number on line 8 with the number of extra columns you have. i.e. If you have a calls.txt file with the columns sampleID, chromosome, start, end, type, variant_type, unique, ID, then change

python $DIR/calls_txt_to_bed.py $SDIR/calls.txt $SDIR/calls.bed 15

to

python $DIR/calls_txt_to_bed.py $SDIR/calls.txt $SDIR/calls.bed 3

If you want to use HG38 GFF3 file, replace lines 20, 21, and 26 with

python $DIR/gff_parsing.py $DIR/DGV.GS.hg38.gff3 $DIR/DGVHG38.bed
sort-bed $DIR/DGVHG38.bed > $DIR/sorted_DGVHG38.bed

bedmap --echo --echo-map-id-uniq --fraction-both 0.75 $SDIR/anno_calls3.bed $DIR/sorted_DGVHG38.bed > $SDIR/annotated_calls.bed

Now, you're ready to activate the bash script.

Give permissions to bash script:

foo@bar:~$ chmod u+x annotate.sh

Run annotate.sh:

./annotate.sh

The final calls should be located in your sub-directory called FINAL_ANNOTATED_CALLS.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNV Annotator

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
annotate.sh		annotate.sh
biomart_analysis.R		biomart_analysis.R
calls_txt_to_bed.py		calls_txt_to_bed.py
final_step.py		final_step.py
get_sampleID.py		get_sampleID.py
gff_parsing.py		gff_parsing.py
replace_absolute_value.py		replace_absolute_value.py
replaceunderscore.py		replaceunderscore.py

nlin0907/CNV-Annotator

Folders and files

Latest commit

History

Repository files navigation

CNV Annotator

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages