Skip to content

Introduction

Giulio Caravagna edited this page Sep 25, 2018 · 1 revision

Motivation

MOBSTER's statistical model is a (K+1)-dimensional Dirichlet mixture with:

  • K Beta distributions that, as in other Machine Learning methods, model subclonal peaks in the VAF distribution;
  • 1 Pareto component to model the power law tail predicted by Population Genetics to describe alleles under neutral evolution.

Tail(s) are a feature of clonal populations under selective pressures. When a cell acquires a mutation under positive selection, the VAF of passenger mutations in its progeny will accumulate following a power law. With high-resolution genomic data, we need to model tail to avoid overfitting the number of clones in the data.

Usage

MOBSTER provides several ways to assess the best fit - with and without tails - of the VAF distribution. MOBSTERs allows to control for tails, which are a confounder for the true inference:

  1. we first detect tails and remove them from our data, and then
  2. we cluster read counts of remaining mutations with standard tools.

MOBSTER fits (moment-matching or maximum-likelihood) are fast to compute even for large cohorts. The best model minimises one of possible scores: BIC, AIC, ICL or reICL, a new reduced-entropy variation to ICL. Integration with sciClone and routines to carry put a multi-variate analysis are also provided.

Clone this wiki locally