Skip to content
/ mitoVGP Public

The Vertebrate Genomes Project Mitogenome Assembly Pipeline

License

Notifications You must be signed in to change notification settings

gf777/mitoVGP

Repository files navigation

mitoVGP 2.2

This repository contains scripts used to generate mitochondrial sequences for the Vertebrate Genomes Project.

Software and Data Use Policy

mitoVGP is distributed under the BSD 3-Clause License.

VGP samples and data come from a variety of sources. To support fair and productive use of this data, please abide by the Data Use Policy and contact us with any questions.

If you use mitoVGP or any of the mitogenomes generated by the VGP please cite:

Formenti, G., Rhie, A., Balacco, J. et al. Complete vertebrate mitogenomes reveal widespread repeats and gene duplications. Genome Biol 22, 120 (2021). https://doi.org/10.1186/s13059-021-02336-9

Content Description:

  • canu-1.8.Linux-amd64.tar.xz - the popular long read assembler employed in the pipeline

  • mitoVGP_conda_env_pacbio.yml - conda environment containing all software required to run the pipeline with Pacbio data on Linux

  • mitoVGP_conda_env_ONT.yml - conda environment containing all software required to run the pipeline with ONT data on Linux

  • mitoVGP - the pipeline

  • scripts/ - the intermediate scripts required by mitoVGP

Quick Start

mitoVGP is available for Linux64 and requires Conda. To install and run follow these instructions:

git clone https://github.com/gf777/mitoVGP.git #clone this git repository
cd mitoVGP #get into mitoVGP folder

tar -xvf canu-1.8.Linux-amd64.tar.xz #install canu assembler
rm canu-1.8.Linux-amd64.tar.xz

#create the mitoVGP pipeline software environment
#please note: Pacbio software only runs on Python 2, while ONT software requires Python 3,
#therefore two different environments must be set depending on data type.
#Pacbio:
conda env create -f mitoVGP_conda_env_pacbio.yml
#ONT:
conda env create -f mitoVGP_conda_env_ONT.yml

conda activate mitoVGP_pacbio #activate mitoVGP conda environment, use mitoVGP_ONT for Nanopore datasets

#run mitoVGP pipeline using 24 cores (example with M. armatus, Pacbio and 10x data)
./mitoVGP -a pacbio -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller

For additional options and specifications you can type:

./mitoVGP -h

Please note that depending on your Pacbio chemistry you will need to define a different polishing tool. For chemistry 2.0 (default):

./mitoVGP -b gcpp

For chemistry lower than 2.0 use:

./mitoVGP -b variantCaller

For RSII chemistries you may also want to align reads using blasr:

./mitoVGP -b variantCaller -m blasr

Pipeline workflow

An existing reference from closely to distantly related species is used to identify mito-like reads in pacbio/ONT WGS data, which are then employed in de novo genome assembly. The assembly is further polished using both long and short read data, and linearized to start with the conventional Phenylalanine tRNA sequence.

VGP mitogenomes assembled using mitoVGP pipeline can be found on GenomeArk and include:

Pacbio
Anna's hummingbird (Calypte anna)
Atlantic Halibut (Hippoglossus hippoglossus)
Atlantic horse mackerel (Trachurus trachurus)
Blue Whale (Balaenoptera musculus)
Blunt-snouted clingfish (Gouania willdenowi)
Boesman’s rainbowfish (Melanotaenia boesemani)
Bolson tortoise (Gopherus flavomarginatus)
Bottlenose dolphin (Tursiops truncatus)
Brown rat (Rattus norvegicus)
Brown trout (Salmo trutta)
Budgerigar (Melopsittacus undulatus)
Californian Sea Lion (Zalophus californianus)
Canada Lynx (Lynx canadensis)
Carmine Bee-eater (Merops nubicus)
Chicken (Gallus gallus)
Chimpanzee (Pan troglodytes)
Climbing perch (Anabas testudineus)
Common brushtail possum (Trichosurus vulpecula)
Common Cuckoo (Cuculus canorus)
Common marmoset (Callithrix jacchus)
Common pipistrelle (Pipistrellus pipistrellus)
Common starfish (Asterias rubens)
Common Tern (Sterna hirundo)
Common Yellowthroat (Geothlypis trichas)
Copperband butterflyfish (Chelmon rostratus)
Cow (Angus/Braham Hybrid) (Bos taurus)
Denticle herring (Denticeps clupeoides)
Downy Woodpecker (Dryobates pubescens)
Eastern happy (Astatotilapia calliptera)
Electric eel (Electrophorus electricus)
Eurasian Golden Plover (Pluvialis apricaria)
Eurasian otter (Lutra lutra)
Eurasian red squirrel (Sciurus vulgaris)
European common frog (Rana temporaria)
European golden eagle (Aquila chrysaetos)
European Toad (Bufo bufo)
Flier cichlid (Archocentrus centrarchus)
Gaboon caecilian (Geotrypetes seraphini)
Gilthead seabream (Sparus aurata)
Goode's Thornscrub tortoise (Gopherus evgoodei)
Great Potoo (Nyctibius grandis)
Great white shark (Carcharodon carcharias)
Greater Horseshoe Bat (Rhinolophus ferrumequinum)
Greater Mouse-Eared Bat (Myotis myotis)
Greater pipefish (Syngnathus acus)
Grey crowned-crane (Balearica regulorum)
Grey squirrel (Sciurus carolinensis)
Gyrfalcon (Falco rusticolus)
Honeycomb rockfish (Sebastes umbrosus)
Hourglass Treefrog (Dendropsophus ebraccatus)
Human (Homo sapiens)
Indian glassy fish (Parambassis ranga)
Indo-pacific tarpon (Megalops cyprinoides)
Japanese puffer (Torafugu) (Takifugu rubripes)
John dory (Zeus faber)
Kakapo (Strigops habroptilus)
Korean giant-fin mudskipper (Periophthalmus magnuspinnatus)
Kuhl's Pipistrelle (Pipistrellus kuhlii)
Largescale Four-Eyed Fish (Anableps anableps)
Leatherback Sea Turtle (Dermochelys coriacea)
Lesser kestrel (Falco naumanni)
Linnaeus's Two Toed Sloth (Choloepus didactylus)
Live sharksucker (Echeneis naucrates)
Lumpfish (Cyclopterus lumpus)
Maguari Stork (Ciconia maguari)
Mute Swan (Cygnus olor)
Needlefish (Xenentodon cancila)
New Caledonian crow (Corvus moneduloides)
Nile rat (Arvicanthis niloticus)
Northern pike (Esox lucius)
Pale spear-nosed Bat (Phyllostomus discolor)
Platypus (Ornithorhynchus anatinus)
Razorbill (Alca torda)
Red-bellied piranha (Pygocentrus nattereri)
Red-fronted tinkerbird (Pogoniulus pusillus)
Red-legged Seriema (Cariama cristata)
Reedfish (Erpetoichthys calabaricus)
Rifleman (Acanthisitta chloris)
Ring-tailed lemur (Lemur catta)
Sand lizard (Lacerta agilis)
Sea Lamprey (Petromyzon marinus)
Short-beaked echidna (Tachyglossus aculeatus)
Smalltooth sawfish (Pristis pectinata)
Southern tamandua (Tamandua tetradactyla)
Spotted scat (Scatophagus argus)
Spotty Wrasse (Notolabrus celidotus)
Sterlet (Acipenser ruthenus)
Stoat (Mustela erminea)
Swainson's thrush (Catharus ustulatus)
Thorny Skate (Amblyraja radiata)
Tiny Cayenne Caecilian (Microcaecilia unicolor)
Tire track eel (Mastacembelus armatus)
Two-lined caecilian (Rhinatrema bivittatum)
Vaquita (Phocoena sinus)
Warty Frogfish (Antennarius maculatus)
Whiskered Treeswift (Hemiprocne comata)
Yellow-throated Sandgrouse (Pterocles gutturalis)
Zebra Finch (female) (Taeniopygia guttata)
Zebra Finch (male) (Taeniopygia guttata)
Zebrafish SAT strain (Danio rerio)


Nanopore
Spotty Wrasse (Notolabrus celidotus)
Thorny Skate (Amblyraja radiata)
Hourglass Treefrog (Dendropsophus ebraccatus)
Sand lizard (Lacerta agilis)