-
Notifications
You must be signed in to change notification settings - Fork 8
Introduction
MOBSTER's statistical model is a (K+1)
-dimensional Dirichlet mixture with:
-
K
Beta distributions that, as in other Machine Learning methods, model subclonal peaks in the VAF distribution; -
1
Pareto component to model the power law tail predicted by Population Genetics to describe alleles under neutral evolution.
Tail(s) are a feature of clonal populations under selective pressures. When a cell acquires a mutation under positive selection, the VAF of passenger mutations in its progeny will accumulate following a power law. With high-resolution genomic data, we need to model tail to avoid overfitting the number of clones in the data.
MOBSTER provides several ways to assess the best fit - with and without tails - of the VAF distribution. MOBSTERs allows to control for tails, which are a confounder for the true inference:
- we first detect tails and remove them from our data, and then
- we cluster read counts of remaining mutations with standard tools.
MOBSTER fits (moment-matching or maximum-likelihood) are fast to compute even for large cohorts. The best model minimises one of possible scores: BIC
, AIC
, ICL
or reICL
, a new reduced-entropy variation to ICL
. Integration with sciClone and routines to carry put a multi-variate analysis are also provided.