Skip to content

Commit

Permalink
merge dev into master, resolved conflicts in index.html and JSON data
Browse files Browse the repository at this point in the history
  • Loading branch information
ArtPoon committed Jun 23, 2020
2 parents c55e756 + 5e6f311 commit 332b6c5
Show file tree
Hide file tree
Showing 12 changed files with 736 additions and 288 deletions.
70 changes: 7 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@

CoVizu is an open source project to develop a `near real time' SARS-CoV-2 genome analysis and visualization system that highlights potential cases of importation from other countries or ongoing community transmission.

* [Rationale](https://github.com/PoonLab/covizu#rationale)
* [Dependencies](https://github.com/PoonLab/covizu#dependencies)
* [Getting started](https://github.com/PoonLab/covizu#getting-started)
* [Acknowledgements](https://github.com/PoonLab/covizu#acknowledgements)

The current mode of visualization employed by CoVizu that we are tentatively referring to as a "beadplot":

<p align="center">
Expand All @@ -21,30 +16,21 @@ The current mode of visualization employed by CoVizu that we are tentatively ref
* Vertical lines connect variants that are related by a minimum spanning tree, which gives a *rough* approximation of transmission links. The variant at the bottom terminus of the vertical line is the putative source.
* The relative location of variants along the vertical axis does not convey any information. The variants are sorted with respect to the vertical axis such that ancestral variants are always below their "descendant" variants.

**It is not feasible to reconstruct accurate epidemiological links using only genomic data.**
However, our objective is to identify population-level events like ongoing community transmission or movement between countries, not to attribute a transmission to a specific individual.
**It is not feasible to reconstruct accurate links using only genomic data.** However, our objective is to identify population-level events like importations into Canada, not to attribute a transmission to a specific source individual.


## Rationale

### An enormous number of genomes
There is a rapidly accumulating number of genome sequences of severe acute
respiratory syndrome coronavirus 2 (SARS-CoV-2) --- as many as thousands
every day -- that have been collected at sites around the world.
These data are predominantly available through the Global Intiative on Sharing
respiratory syndrome coronavirus 2 (SARS-CoV-2) collected at sites around
the world, predominantly available through the Global Intiative on Sharing
All Influenza Data (GISAID) database.
The public release of these genome sequences in near real-time is an
unprecedented resource for molecular epidemiology and public health.
For example, [nextstrain](http://nextstrain.org) has been at the forefront
of analyzing and communicating the global distribution of SARS-CoV-2 genomic
variation.

Visualizing the entirety of this genome database represents a significant
challenge. Although it is possible to reconstruct a tree relating all
genome sequences, for example, it is difficult to display the entire tree in a
meaningful way.

### Trees are limiting
The central feature of [nextstrain](nextstrain.org) is a reconstruction of
a time-scaled phylogeny (a tree-based model of how infections are related
by common ancestors back in time).
Expand All @@ -64,13 +50,9 @@ are directly sampled --- we think this is not unreasonable given the
relatively slow rate of molecular evolution in comparison to the virus
transmission rate.

### A majority of genomes are identical
Another limitation of the tree visualization is that it does not convey
information about observing the same genome sequence from multiple samples
over time.
About two-thirds of the SARS-CoV-2 genome sequences in GISAID are identical to
another genome.

There is no means to differentiate identical sequences in a phylogeny
because there are no phylogenetically informative sites that separate them.
One could extend the tips of the tree to span the time period of sample
Expand All @@ -79,51 +61,14 @@ However, the time scale of sampling identical genomes is relatively short
compared to the evolutionary history of the virus that is represented by
the tree.

Sampling identical genomes in different locations or over different
points in time from the same location is useful information for public health.
Our primary motivation for developing beadplots was to place greater visual
emphasis on this information.


## Dependencies
CoVizu is being developed on [Ubuntu Linux](https://ubuntu.com/) and macOS platforms.

* Mozilla [geckodriver](https://github.com/mozilla/geckodriver) v0.26+ (optional for data retrieval)
* [Python](https://www.python.org/) 3.6 or higher, and the following modules:
* [Selenium](https://github.com/SeleniumHQ/selenium/) version 3.14.1+ (optional for data retrieval)
* [gotoh2](https://github.com/ArtPoon/gotoh2/) - requires a build environment for C
* [networkx](https://networkx.github.io/) version 2.3+
* [BioPython](https://biopython.org/) version 1.7+
* GNU [sed](https://www.gnu.org/software/sed/) stream editor
* [TN93](https://github.com/veg/tn93) v1.0.6
* [R](https://cran.r-project.org/) 3.6+, and the following packages:
* [igraph](https://igraph.org/r/) version 1.2+
* [jsonlite](https://cran.r-project.org/web/packages/jsonlite/index.html) version 1.6+
* [Rtsne](https://cran.r-project.org/web/packages/Rtsne/index.html) version 0.15
* [FastTree2](http://www.microbesonline.org/fasttree/) version 2.1.10+, compiled for [double precision](http://www.microbesonline.org/fasttree/#BranchLen)
* [TreeTime](https://github.com/neherlab/treetime) version 0.7.5+


## Getting started

Our source code is distributed with JSON data files, so you can launch a local
instance of CoVizu by running the bash script:
```console
Elzar:covizu artpoon$ bash run-server.sh
Serving HTTP on 127.0.0.1 port 8001 (http://127.0.0.1:8001/) ...
```
and then directing your web browser to `localhost:8001`.
These JSON files are not regularly updated - they are provided for the purpose of front-end
development and demonstration.

### Running the back-end
The following workflow to generate the JSON files from the database is automated by the bash script `covizu.sh`:

## Current workflow

1. Sequences are bulk downloaded from the GISAID database. All developers have signed the GISAID data access agreement, and sequences are not being re-distributed.

2. Sequences are aligned pairwise against the SARS-COV-2 reference genome using the Procrustean method implemented in [gotoh2](http://github.com/ArtPoon/gotoh2) - see `updater.py`. This module provides a method that progressively updates an existing alignment file with new sequence records, avoiding the re-alignment of previously released genomes.

3. Sequences are filtered using `filtering.py` for entries that are derived from non-human sources, incomplete genomes, and genomes that contain >5% fully ambiguous base calls (`N`s). **If you prefer to use your own alignment software, this would be your entry point using a FASTA file as input, with the original GISAID sequence headers.**
3. Sequences are filtered using `filtering.py` for entries that are derived from non-human sources, incomplete genomes, and genomes that contain >5% fully ambiguous base calls (`N`s).

4. A pairwise genetic distance matrix is generated using [TN93](http://github.com/veg/tn93) - only distances below a cutoff of `0.0001` are recorded to the output file.

Expand All @@ -137,5 +82,4 @@ The following workflow to generate the JSON files from the database is automated


## Acknowledgements
CoVizu was made possible by the labs who have generated and contributed SARS-COV-2 genomic sequence data that is curated and published by [GISAID](https://www.gisaid.org/). We sincerely thank these labs for making this information available to the public and open science.
The development of CoVizu is supported in part by a Project Grant from the [Canadian Institutes of Health Research](https://cihr-irsc.gc.ca/e/193.html) (PJT-156178).
The development and validation of these scripts was made possible by the labs who have generated and contributed SARS-COV-2 genomic sequence data that is curated and published by [GISAID](https://www.gisaid.org/). We sincerely thank these labs for making this information available to the public and open science.
3 changes: 0 additions & 3 deletions covizu.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
#!/bin/bash

source ~/.bashrc

# download and update gisaid-aligned.fa

python3 scripts/autobot.py >> debug/Autobot.log

# screen for non-human and low-coverage samples -> gisaid-filtered.fa
Expand Down
163 changes: 142 additions & 21 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,119 @@
<head>
<title>CoVizu</title>
<link rel="stylesheet" href="css/style.css">
<link rel="stylesheet" href="https://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
</head>
<style>
.container {
padding-top: 30px;
}
searchp {
position:fixed;
left:10px;
top:10px;
}

details ::-webkit-details-marker {
color: gray;
display: none;
}
details summary::after{
content: "";
position: absolute;

}
[open] summary::after{
transform:rotate(90deg) ;

}
[open] summary,
summary:hover {
background-color: orange;
box-shadow: inset 1px 0 #ddd, inset -1px 0 #ddd;
max-height: 100px;
}
.box a{
display:block;
text-decoration: none;
color: currentcolor
}
.box {
position: absolute;
background-color: #fff;
min-width: 100px;
max-height: 0;
overflow: hidden;
}
details + dl {
max-height: 0;
transition: all .25s;
margin: 0 0 1rem;
overflow: hidden;
}
[open] + dl{
max-height: 100px;
}

/* right hand sidebar */
rightbar {
list-style-type: none;
margin: 0;
padding: 0;
width: 250px;
top: 80px;
background-color: white;
position: fixed;
height: 100%;
overflow: auto;
right :0;
}
.bar {}
.bar form {
height: 42px;
}
.bar input {
width: 250px;
border-radius: 1px;
border: 1px solid #324B4E;
background: #fff;
transition: .3s linear;
float: right;
}
.bar input:focus {
width: 300px;
}
.search_btn {
width: 21px;
height: 21px;
background: url(search.png) left center no-repeat;
float: right;
display: inline;
margin-left: 5px;
}
span{float: left}
</style>
<body>
<a href="https://github.com/PoonLab/covizu" class="github-corner" aria-label="View source on GitHub">
<svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true">
<svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: fixed; top: 0; border: 0; right: 0;" aria-hidden="true">
<path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path>
<path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path>
<path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path>
</svg>
</a>

<div class="search bar">
<searchp>
<div id="search-bar">
<span>
<input type="search" id="search-input"
placeholder="e.g., EPI_ISL_434070 or Canada"></span>
<span><button id="search-button">Search</button></span>
</div>
</searchp>
</div>

<div class="container">
<table>
<tr>
<td colspan="3"><div id="search-bar">
<input type="search" id="search-input">
<button id="search-button">Search</button>
</div></td>
</tr>
<tr>
<td>Time-scaled tree</td>
<td>Beadplot</td>
Expand All @@ -30,18 +125,36 @@
<td><div id="svg-timetree"></div></td>
<td><div id="svg-cluster"></div></td>
<td>
<h1>CoVizu</h1>
<h3>Near real-time visualization of SARS-CoV-2 genomic variation</h3>
<p>TEST SERVER</p>
<p><div id="div-last-update"></div><div id="div-number-genomes"></div></p>

<h4>Variant/node info:</h4>
<div class="breaker" id="text-node"></div>
<rightbar>
<h1>CoVizu</h1>
<h3>Near real-time visualization of SARS-CoV-2 genomic variation</h3>
<p>
<div id="div-last-update"></div>
<div id="div-number-genomes"></div>
</p>
<!-- <details>
<summary>Help</summary>
</details>
<dl>
<a>help info here</a>
</dl>-->
<details>
<summary>Cluster statistics</summary>
</details>
<dl>
<a>cluster statistics here</a>
</dl>
<div class="breaker" id="text-node"></div>
</rightbar>

</td>
</tr>
</table>
</div>
<div class="tooltip" id="tooltipContainer">
</div>
<script src="js/jquery.js"></script>
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.min.js"></script>
<script src="js/d3.js"></script>
<script src="js/beadplot.js"></script>
<script src="js/phylo.js"></script>
Expand All @@ -59,13 +172,13 @@ <h4>Variant/node info:</h4>
});

var country_pal = {
"Africa": "#66D7DF",
"Asia": "#EBBE8F",
"China": "#E9B4F4",
"Europe": "#7BD9AD",
"North America": "#AAC8FC",
"Oceania": "#FFB1C0",
"South America": "#BBCF85"
"Africa": "#EEDD88",
"Asia": "#BBCC33",
"China": "#EE8866",
"Europe": "#44BB99",
"North America": "#99DDFF",
"Oceania": "#FFAABB",
"South America": "#77AADD"
};

// load time-scaled phylogeny (treetime.py) from server
Expand Down Expand Up @@ -105,6 +218,14 @@ <h4>Variant/node info:</h4>
*/

accn_to_cid = index_accessions(clusters);

$('#search-input').autocomplete({
source: get_autocomplete_source_fn(accn_to_cid),
select: function( event, ui ) {
const accn = ui.item.value;
search(accn);
}
});
});
</script>
</body>
Expand Down
Loading

0 comments on commit 332b6c5

Please sign in to comment.