VirDTL is a computational protocol for inference of both ancestral and extant strain recombination in viral genomes, using phylogenetic reconciliation. Duplication-Transfer-Loss (DTL) reconciliation accounts for incongruencies between the strain evolution tree and the evolutionary trees of each gene family by inferring a history of gene duplications, gene losses, and horizontal gene transfers (HGT). virDTL leverages DTL reconciliation to analyze incongruencies between the strain evolutionary tree and the evolutionary trees of each gene family (or genomic regions) to infer possible horizontal gene transfers, which correspond to possible recombination events in the context of viral evolution.
VirDTL is described in the paper "virDTL: Viral recombination analysis through phylogenetic reconciliation and its application to sarbecoviruses and SARS-CoV-2" by Zaman, Sledzieski, Berger, Wu, and Bansal.
If you use VirDTL in a publication, please cite
@article {ZamanVirDTL,
author = {Zaman, Sumaira and Sledzieski, Samuel and Berger, Bonnie and Wu, Yi-Chieh and Bansal, Mukul S.},
title = {virDTL: Viral recombination analysis through phylogenetic
reconciliation and its application to sarbecoviruses and
SARS-CoV-2},
published = {Mary Ann Liebert},
year = {2022},
URL = {https://compbio.engr.uconn.edu/wp-content/uploads/sites/2447/2022/07/virDTL_JCB2022_Preprint.pdf},
journal = {Journal of Computational Biology}
}
To install virDTL
on Linux systems, run
source ./install.sh
This will create and activate the virDTL
conda environment if it does not
exist, and will copy the required binaries from
software
to ~/.local/bin
.
To install the conda environment directly without installing the binaries, run
conda env create -f environment.yml
-
Download and pre-process sequence data as described 0_fetch_data.
-
Generate a whole genome alignment as described in 1_align_whole_genome
-
Annotate genes from each sequence, construct gene family alignments, and estimate gene family trees with RAxML, using the scripts in 2_construct_gene_trees.
-
Generate multiple species trees using RAxML or BEAST (recommended) as described in 3_construct_species_tree
-
Error correct gene family trees with TreeFix-DTL, using the scripts in 4_error_correct_gene_trees.
-
Reconcile the gene family trees with the strain tree with RANGER-DTL, using the scripts in 5_reconcile_gene_trees.
-
Aggregate and summarize recombination events with support values, using the scripts in 6_parse_reconciliation.
-
Analyze the aggregate recombination events, using the notebooks in 7_analysis or
Processed data sets from our analysis of the Sarbecovirus subgenus can be found in the corresponding folders.