Published documentation

epiforecasts · Dec 22, 2021 · b580177 · b580177
commit b580177
Show file tree

Hide file tree

Showing 7 changed files with 1,196 additions and 0 deletions.
diff --git a/footer.html b/footer.html
@@ -0,0 +1,12 @@
+&nbsp;
+<hr />
+<p style="text-align: center;">Developed by <a href="https://samabbott.co.uk/about">Sam Abbott</a>, Katharine Sherratt, and Sebastian Funk </p>
+<!-- Add icon library -->
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
+
+<!-- Add font awesome icons -->
+<p style="text-align: center;">
+    <a href="https://github.com/epiforecasts/omicron-sgtf-forecast/" class="fa fa-github"></a>
+</p>
+
+&nbsp;
diff --git a/header.html b/header.html
@@ -0,0 +1 @@
+<a href="https://github.com/epiforecasts/omicron-sgtf-forecast" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#70B7FD; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
diff --git a/library.bib b/library.bib
@@ -0,0 +1,68 @@
+@Manual{R,
+    title = {R: A Language and Environment for Statistical Computing},
+    author = {{R Core Team}},
+    organization = {R Foundation for Statistical Computing},
+    address = {Vienna, Austria},
+    year = {2019},
+    url = {https://www.R-project.org/},
+  }
+
+@Manual{cmdstanr,
+    title = {cmdstanr: R Interface to 'CmdStan'},
+    author = {Jonah Gabry and Rok Češnovar},
+    year = {2021},
+    note = {https://mc-stan.org/cmdstanr, https://discourse.mc-stan.org},
+  }
+
+@Manual{stan,
+    title = {Stan Modeling Language Users Guide and Reference Manual, 2.28.1},
+    author = {Stan Development Team},
+    year = {2021},
+    note = {https://mc-stan.org},
+  }
+
+@Manual{scoringutils,
+  title = {scoringutils: A collection of proper scoring rules and metrics to assess predictions},
+  author = {Nikos Bosse},
+  year = {2020},
+  note = {R package version 0.0.0.9000},
+  url = {https://github.com/epiforecasts/scoringutils}
+}
+
+@Article{betancourt_2017,
+  title={Diagnosing biased inference with divergences},
+  author={Betancourt, Michael},
+  year={2017},
+  volume={4},
+  journal={Stan Case Studies},
+  url={https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html}
+}
+
+@Article{forecast.vocs,
+  title = {forecast.vocs: Forecast case and sequence notifications using variant of concern strain dynamics},
+  author = {Sam Abbott},
+  journal = {Zenodo},
+  year = {2021},
+  doi = {10.5281/zenodo.5559016},
+  }
+
+  @Misc{loo,
+    title = {loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models},
+    author = {Aki Vehtari and Jonah Gabry and Mans Magnusson and Yuling Yao and Paul-Christian Bürkner and Topi Paananen and Andrew Gelman},
+    year = {2020},
+    note = {R package version 2.4.1},
+    url = {https://mc-stan.org/loo/},
+  }
+
+  @Article{loo-paper,
+    title = {Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC},
+    author = {Aki Vehtari and Andrew Gelman and Jonah Gabry},
+    year = {2017},
+    journal = {Statistics and Computing},
+    volume = {27},
+    issue = {5},
+    pages = {1413--1432},
+    doi = {10.1007/s11222-016-9696-4},
+  }
+
+To cite the
diff --git a/paper.Rmd b/paper.Rmd
@@ -0,0 +1,237 @@
+---
+title: "Real-time estimation of the time-varying transmissibility advantage of Omicron in England using S-Gene Target Status as a Proxy"
+author: Katharine Sherratt, Sam Abbott, Sebastian Funk
+output: bookdown::pdf_document2
+---
+
+```{r setup, include = FALSE}
+knitr::opts_chunk$set(echo = FALSE, include = TRUE,
+                      warning = FALSE, message = FALSE,
+                      root.dir = here::here())
+options(scipen = 1, digits = 2)
+library(here)
+library(forecast.vocs)
+library(dplyr)
+library(tidyr)
+library(ggplot2)
+library(janitor)
+library(knitr)
+library(data.table)
+```
+
+
+```{r load-results}
+# load packags
+library(ggplot2)
+library(scales)
+library(dplyr)
+library(here)
+library(readr)
+
+# load functions
+source(here("R", "load-local-data.R"))
+
+# get latest date
+date <- get_latest_date()
+
+# load data
+daily <- load_local_data(date)
+
+# load results
+sgtf <- load_results(date)
+bias <- load_results(date, type = "bias")
+```
+
+## Abstract {-}
+
+**Background**
+
+**Methods**
+
+**Results**
+
+**Conclusions**
+
+
+## Introduction {-}
+
+**Why**
+
+**Detail**
+
+**Aim and what we did**
+
+We aimed to assess competing explanations of the transmission advantage of a variant of concern (VoC), Omicron, compared to the existing dominant strain, Delta, in England. We explored the likelihood of increased transmissibility compared to immune escape, using S-gene target failure as a proxy for infection with Omicron.
+
+We use a model framework where we vary only the relationship between variants while holding all other parameters constant. We compare leave-one-out validation among the models and explore the estimated transmission advantage.
+
+When processing a positive test result for Covid-19 it is also possible to detect the S-gene target (SGT). A test for SGT can give either a failure or a positive result. The proportion of cases that are tested for SGT and result in SGT-Failure is a useful proxy for the proportion of the Omicron variant in comparison to the Delta variant. 
+
+Only a sample of positive Covid-19 test results are also tested for SGT. The ability to test for SGT depends on type of Covid-19 test (only PCR), and laboratory resources. If these constraints are associated with factors influencing transmission, this would bias estimates of transmission made only from SGT data and confound the relationship between SGT results and all test-positive cases. 
+
+We aim to explore bias in the availability of S-gene target results among cases testing positive for Covid-19. Here we use a branching process model to identify any difference in growth rate between cases which have an SGT result and those which do not.
+
+## Methods {-}
+
+### Data
+
+We used case data from England. This includes all cases tested by PCR test following symptom onset and a positive lateral flow test result. For a varying proportion of these cases, PCR test results included information on S-gene target. We used S-gene target negativity as a proxy for the Omicron variant. We excluded data before `r start_date`, to reduce overfitting to stochastic imports. Data are included up to `r max(obs$date)` and modelled at a `r parameters$timespan` day resolution. All dates are by specimen date.
+
+We used a two-strain branching process model to compare all test-positive cases to test-positive cases with any SGT result. We used data for all England test-positive cases. We used only the most recent three weeks of data between `r min(obs$date)` and `r max(obs$date)`. 
+
+### Models
+
+We model strain dynamics using the single strain model as a starting point but with the addition of strain specific AR(P) case model and a beta binomial (or optionally binomial) observation process for sequence data. The variant of concerns growth rate is modelled either as a fixed scaling of the non-variant of concern growth rate, as an independent AR(1) process, or in a vector autoregression framework as a correlated AR(1) process. This last formulation is described in detail below along with the modifications required to define the other two formulations. Parameters related to the variant of concern (VoC) are given the $\delta$ superscript and parameters related to non-VoC cases are given the $o$ superscript.
+
+Mean reported cases are again defined using a AR(1) process on the log scale for each strain and then combined for overall mean reported cases. 
+
+\begin{align}
+  \lambda_t &\sim \text{LogNormal}\left(\log C_t , 0.025 \times \log C_t \right) ,\ t \leq P\\
+  \lambda^{\delta}_t &\sim \text{LogNormal}\left(\log C^{\delta}_t , 0.025 \times \log C^{\delta}_t \right),\ t \leq P \\
+  \lambda^{o}_t &= \lambda_t - \lambda^{\delta}_t,\ t \leq P \\
+  \lambda^{\delta}_t &= \text{exp}\left(r^{\delta}_t\right) \sum_{p = 1}^{P} \alpha^{\delta}_p  \lambda^{\delta}_{t-p},\ t \gt P \\
+  \lambda^{o}_t &= \text{exp}\left(r^{o}_t\right) \sum_{p = 1}^{P} \alpha^{o}_p  \lambda^{o}_{t-p}, t \gt P   \\
+  \lambda_t &= \lambda^{\delta}_t + \lambda^{o}_t
+\end{align}
+
+Where $C^{\delta}_0$ is derived by calculating the mean proportion of cases that had the VoC for the first time point using the overall number of reported cases, the number of sequenced cases, and the number of sequences that were positive for the VoC. The growth rate for both VoC and non-VoC cases ($r^{o, \delta}_t$) is again modelled as differenced AR(1) processes but now combined with a variant specific modifier ($s^{o, \delta}_0$) to growth when variant sequences are first reported ($t_{seq}$), and a correlation structure for the time and strain varying error terms ($\eta^{o, \delta}$).
+
+
+\begin{align}
+  r^{o}_1 &\sim \text{Normal}\left(0, 0.25 \right),\\
+  r^{\delta}_{t_{seq}} &= r^{o}_{t_{seq}} + s^{\delta} \\
+  r^{o, \delta}_t &= r^{o, \delta}_{t-1} + \epsilon^{o, \delta}_t  \\
+  \epsilon^{o, \delta}_0 &= \eta^{o, \delta}_0 \\
+  \epsilon^{o, \delta}_t &= \beta \epsilon^{o, \delta}_{t-1} + \eta^{o, \delta}_t
+\end{align}
+
+Where, 
+
+\begin{align}
+  s^{\beta} &\sim \text{Normal}(0, 0.5)  \\
+  \eta^{o, \delta}_t &\sim \text{MVN}\left(0, \boldsymbol \Sigma \right) \\
+  \beta &\sim \text{Normal}\left(0, 0.5 \right)
+\end{align}
+
+Where $\boldsymbol \Sigma$ is a $2 \times 2$ covariance matrix which we decompose for computational stability into a diagonal matrix containing variant specific scale parameters ($\boldsymbol \Delta$) and a symmetric correlation matrix ($\boldsymbol \Omega$) as follows [@stan],
+
+\begin{align}
+  \boldsymbol \Sigma &= \boldsymbol \Delta \boldsymbol \Omega \boldsymbol \Delta \\
+  \boldsymbol \Delta &=  \left[ {\begin{array}{cc}
+                            \sigma^o & 0 \\
+                            0& \sigma^{\delta} \\
+                                \end{array} } \right] \\
+  \boldsymbol \Omega &\sim \left[ {\begin{array}{cc}
+                            1 & \omega \\
+                            \omega & 1 \\
+                              \end{array} } \right] \\
+  \sigma^{o, \delta} &\sim \text{Normal}\left(0, 0.2 \right)
+\end{align}
+
+Where $\boldsymbol \Omega$ has a Lewandowski-Kurowicka-Joe (LKJ) prior where
+$\omega$ controls the prior expectation for correlation between variants (with $\omega = 1$ resulting in a uniform prior over all correlations, $\omega \lt 1$ placing more weight on larger correlations and $\omega \gt 1 placing more weight on small amounts of correlations). By default we set $\omega = 0.5$ giving more weight to correlations between variants dynamics. 
+
+On top of this correlated strain dynamic model `forecast.vocs` also offers two other options that represent extremes in the correlated model. The first of these assumes that both strains evolve with independent differenced AR(1) growth rates with only a scaling factor (s^{\delta}) linking them. The second assumes that strains growth rates are completely correlated in time (i.e they are governed by a single AR(1) differenced process) and only differ based on a scaling factor ($s^{\delta}$). See the documentation for `variant_relationship` in `?forecast()` for details. 
+
+We then assume a negative binomial observation model with overdispersion $\phi_c$ for reported cases ($C_t$) as in the single strain model,
+
+\begin{align}
+  C_{t} &\sim \text{NegBinomial}\left(\lambda_t, \phi_c\right) \\
+  \frac{1}{\sqrt{\phi_c}} &\sim \text{Normal}(0, 0.5) 
+\end{align}
+
+Where $\sigma$, and $\frac{1}{\sqrt{phi_c}}$ are truncated to be greater than 0 and $\beta$ is truncated to be between -1 and 1. Again a Poisson observation model may instead be used (see the documentation for `overdispersion` in `?forecast()`).
+
+Finally, the mean proportion of samples that have the VoC ($p_t$) is then estimated using the mean reported cases with the VoC and the overall mean reported cases.
+
+\begin{equation} 
+  p_t = \frac{\lambda^{\delta}_t}{\lambda_t}
+\end{equation}
+
+We assume a beta binomial observation model for the number of sequences ($N_t$) that are positive ($P_t$) for the VoC with overdispersion $\phi_s$.
+
+\begin{align}
+  P_{t} &\sim \mathrm{BetaBinomial}\left(N_t, p_t \phi_s, (1 - p_t) \phi_s\right) \\
+  \frac{1}{\sqrt{\phi_s}} &\sim \text{Normal}(0, 0.5) 
+\end{align}
+
+Where $\sigma^{o, \delta}$, and $\frac{1}{\sqrt{\phi_s}}$ are truncated to be greater than 0. A binomial observation model is also available (see the documentation for `overdispersion` in `?forecast()`).
+
+
+We used the `forecast.vocs` package^[2021, Sam Abbott, forecast.vocs: Forecast case and sequence notifications using variant of concern strain dynamics, DOI:10.5281/zenodo.5559016] to model a two-strain branching process, including strain specific auto-regressive AR(1) variation, and a beta binomial observation process for the S-gene target status data. We used a weakly informative prior for a transmission advantage for the VoC vs non-VoC cases of mean `r parameters$voc_scale[1]` (standard deviation `r parameters$voc_scale[2]`), based on early work from South Africa^[2021-12-03, Carl Pearson and others, "Omicron spread in South Africa", Epidemics8]. 
+
+We defined the relationship between variants as either fixed or independent. The fixed relationship meant the VoC differed only by its transmissibility to the non-VOC strain, with any variation over time shared between strains. The independent relationship allowed the VoC to differ with dependence determined from the data. See Appendix for full model details.
+
+### Transmission summary statistics
+
+As well as posterior predictions and forecasts for both notifications by variant and variant of concern proportion the models also return several summary statistics which may be useful for drawing inferences about underlying transmission dynamics. These include the log scale growth rate ($g^{o, \delta}_t$), the instantaneous effective reproduction number ($R^{o, \delta}_t$), and the transmission advantage of the variant of concern ($A_t$). These are calculated as follows:
+
+\begin{align}
+  g^{o, \delta}_t &= T_s r^{o, \delta}_t \\
+  R^{o, \delta}_t &= \text{exp}\left(T_s r^{o, \delta}_t\right) \\
+  A_t &= \text{exp}\left(T_s \left(r^{\delta}_t - r^{o}_t \right)\right)\\
+\end{align}
+
+$T_s$ is a user set scaling parameter that defines the timespan over which the summary metrics apply dependent on the time step of the data. It can be set using the `scale_r` and defaults to 1 which returns summary statistics scaled to the timestep of the data. Depending on the setup of the model used these summary measures will be more or less related to their epidemiological definitions. In particular, adding a weighting to past expected cases that is more complex than a simple lag may cause interpretation issues.
+
+### Statistical analysis
+
+We tested the difference between the two models by comparing out of sample predictive fit using leave-one-out cross-validation with Pareto smoothed importance sampling, and model scoring.
+
+### Implementation
+
+The models are implemented in `stan` using `cmdstanr` with no defaults altered[@stan; @cmdstanr]. Due to the complexity of the posterior it is likely that increasing the `adapt_delta` may be required to mitigate potential bias in posterior estimates [@betancourt_2017]. `forecast.vocs` incorporates additional functionality written in R[@R] to enable plotting forecasts and posterior predictions, summarising forecasts, and scoring them using `scoringutils`[@scoringutils]. All functionality is modular allowing users to extend and alter the underlying model whilst continuing to use the package framework.
+
+
+### Reproducibility
+
+## Results {-}
+
+### Data description
+
+* Graph of cases by specimen date by region over time.
+* Discuss trend (centred moving average to plot)
+* Discussion sampling issues + potential sources of bias.
+
+
+On average, `r round(mean(obs$share_voc, na.rm=T)*100)`% cases had an SGT result. We found a median advantage for SGT-result cases of `r exp(advantage$median)` (95% credible interval `r exp(advantage$q5)` - `r exp(advantage$q95)`), scaled against cases without an SGT result. The growth rate increased over time at a similar rate between SGT-result and no-SGT-result cases (figure \ref{fig:plot-growth}).
+
+
+### Model comparison
+
+```{r compare-score}
+scores <- janitor::clean_names(
+  select(sgtf$loo, -forecast_date), case = "sentence"
+)
+kable(scores)
+```
+
+### Growth rate of S-gene positive and negative cases
+
+### Proportion of cases with SGTF
+
+### Transmission advantage of the cases with SGTF
+
+### Posterior predictions of cases with an without SGTF as well as overall
+
+### Evidence of sampling bias in S-gene target results among Covid-19 test-positive cases
+
+## Discussion {-}
+
+**Summary**
+
+**Strengths and weaknesses**
+
+**Literature**
+
+**Futher work**
+
+**Conclusions**
+
+**Acknowledgements**
+
+
+## References {-}
+
+
+