Skip to content

Latest commit

 

History

History
208 lines (150 loc) · 6.46 KB

esl_gamma.md

File metadata and controls

208 lines (150 loc) · 6.46 KB

Gamma distributions

Variate $x$ $\mu < x < \infty$
Lower bound $\mu$ $-\infty < \mu < \infty$
Rate (Inverse scale) $\lambda$ $\lambda > 0$
Shape $\tau$ $\tau > 0$
Mean $\frac{\tau}{\lambda} + \mu$
Variance $\frac{\tau}{\lambda^2}$

The distribution starts at $\mu$ (support is $x > \mu$). We typically know $\mu$. It is often 0; or in the case of fitting a gamma density to a right tail, we know the threshold $\mu$ at which we truncated the tail.

Probability density function (PDF):

$$ P(x) = \frac{\lambda^{\tau}}{\Gamma(\tau)} (x-\mu)^{\tau-1} e^{-\lambda (x - \mu)} $$

The cumulative distribution function (CDF) does not have an analytical expression. It is calculated numerically using the incomplete Gamma function (esl_stats_IncompleteGamma()).

Some sources will say $x \geq \mu$, but $P(x=\mu) = 0$. Easel code makes sure $x > \mu$ to avoid $\log P$ terms of $-\infty$.

Maximum likelihood parameter estimation

Complete data; known $\mu$

Given a complete dataset of $n$ observed samples $x_i$ ($i=1..n$) and known $\mu$, esl_gam_FitComplete() estimates maximum likelihood parameters $\hat{\tau}$ and $\hat{\lambda}$ using a generalized Newton optimization [Minka00,Minka02].

The optimization only needs two sufficient statistics, the means $\bar{x} = \frac{1}{n} \sum_i (x_i - \mu)$ and $\overline{\log x} = \frac{1}{n} \sum_i \log (x_i - \mu)$, which we precalculate.

This method chooses initial starting guesses:

$$ \tau_0 = \frac{0.5}{\log \bar{x} - \overline{\log x}} $$ $$ \lambda_0 = \frac{\tau_0}{\bar{x}} $$

and then iterates to convergence:

$$ \frac{1}{\tau_{t+1}} = \frac{1}{\tau_t} + \frac{\log \tau - \log \bar{x} + \overline{\log x} - \Psi(\tau) } { \tau_t - \tau_t^2 \Psi'(\tau_t)} $$ $$ \lambda_{t+1} = \frac{\tau_{t+1}}{\bar{x}} $$

The first and second derivatives of $\log \Gamma(x)$ are called $\Psi(x)$ and $\Psi'(x)$, the "digamma" and "trigamma" functions. These are obtained numerically by esl_stats_Psi() and esl_stats_Trigamma().

See the appendix at the end for the derivation of Minka's method.

Appendix 1: Minka's generalized Newton method

Minka's generalized Newton method works as follows. We aim to maximize the log likelihood of the data:

$$ \log P(\mathbf{x} \mid \tau, \lambda) = n \tau \log \lambda - n \log \Gamma(\tau) + (\tau-1) \sum_i \log(x_i - \mu) - \lambda \sum_i (x_i - \mu) $$

It's equivalent to maximize the average log likelihood, which we can write in terms of two sufficient statistics, the means $\bar{x} = \frac{1}{n} \sum_i (x_i - \mu)$ and $\overline{\log x} = \frac{1}{n} \sum_i \log (x_i - \mu)$:

$$ \frac{1}{n} \log P(\mathbf{x} \mid \tau, \lambda) = \tau \log \lambda - \log \Gamma(\tau) + (\tau-1) \overline{\log x} - \lambda \bar{x} $$

Take derivative w.r.t. $\lambda$, set to zero, solve for $\hat{\lambda}$:

$$ \frac{\partial}{\partial \lambda} = \frac{\tau}{\lambda} - \bar{x} = 0 $$ $$ \hat{\lambda} = \frac{\tau}{\bar{x}} $$

This makes sense because the mean of a Gamma is $\frac{\tau}{\lambda} + \mu$.

Substitute $\hat{\lambda} = \frac{\tau}{\bar{x}}$ back into the average log likelihood to get an objective function $f(\tau)$ in terms of a single variable $\tau$:

$$ f(\tau) = \frac{1}{n} \log P(\mathbf{x} \mid \tau, \hat{\lambda}) = \tau \log \tau - \tau \log \bar{x} - \log \Gamma(\tau) + (\tau-1) \overline{\log x} - \tau $$

which is differentiable:

$$ f'(\tau) = \log \tau - \log \bar{x} - \Psi(\tau) + \overline{\log x} $$ $$ f''(\tau) = \frac{1}{\tau} - \Psi'(\tau) $$

Newton's method for finding the optimum of $f(x)$ works iteratively, by approximating $f(x)$ locally around a current point $x_t$ by fitting a Gaussian distribution $g(x)$ to match $f$ and its first and second derivatives (i.e. $f(x_t) = g(x_t)$, $f'(x_t) = g'(x_t)$, $f''(x_t) = g''(x_t)$), then analytically locating the optimum of $g(x)$ to propose the next point $x_{t+1}$.

Minka generalizes Newton's method by observing that we may be able to find a much better approximation $g(x)$ for our particular $f(x)$. Here, he proposes the approximation:

$$ g(\tau) = c_0 + c_1 \tau + c_2 \log \tau \approx \frac{1}{n} \log P(\mathbf{x} \mid \tau, \hat{\lambda}) $$

Constructing $g(\tau)$ to locally approximate our objective function $f(\tau)$ at a point $\tau_t$ means:

$$ g(\tau_t) = f(\tau_t) = c_0 + c_1 \tau_t + c_2 \log \tau_t $$ $$ g'(\tau_t) = f'(\tau_t) = c_1 + \frac{c_2}{\tau_t} $$ $$ g''(\tau_t) = f''(\tau_t) = -\frac{c_2}{\tau_t^2} $$

which we can solve for estimates of the three parameters of the local approximation $g(\tau)$:

$$ c_2 = - \tau_t^2 f''(\tau_t) $$ $$ c_1 = f'(\tau_t) + \tau_t f''(\tau_t) $$ $$ c_0 = f(\tau_t) - \tau_t f'(\tau_t) + (\log(\tau_t) + 1) \tau_t^2 f''(\tau_t) $$

The optimum $\hat{\tau}$ of $g(\tau)$ is where $g'(\tau) = 0$ so we didn't even need $c_0$, we only need $c_1$ and $c_2$:

$$ g'(\tau) = c_1 + \frac{c_2}{\tau} = 0 $$

Solving for $\hat{\tau}$ gives $-\frac{c_2}{c_1}$, which is:

$$ \hat{\tau} = \frac{\tau_t^2 f''(\tau_t)}{f'(\tau_t) + \tau_t f''(\tau_t)} $$

which is more compactly written as:

$$ \frac{1}{\hat{\tau}} = \frac{1}{\tau_t} + \frac{f'(\tau_t)}{\tau_t^2 f''(\tau_t)} $$

Finally, substituting $f'$ and $f''$ gives us our iterative reestimation equation for our next point $\tau_{t+1}$:

$$ \boxed{ \frac{1}{\tau_{t+1}} = \frac{1}{\tau_t} + \frac{\log \tau - \log \bar{x} + \overline{\log x} - \Psi(\tau) } { \tau_t - \tau_t^2 \Psi'(\tau_t)} } $$

That's our iterative reestimation, but we still need to choose an initial starting point $\tau_0$. Minka observes that $\Psi(\tau) \approx \log(\tau) - \frac{1}{2\tau}$ (from a Stirling approximation $\log \Gamma(\tau) \approx \tau \log \tau - \tau - \frac{1}{2} \tau + \mathrm{const.}$), which we can substitute into $f'(\tau)$, set to zero, and solve:

$$ \boxed{ \tau_0 = \frac{0.5}{\log \bar{x} - \overline{\log x}} \approx \hat{\tau} } $$

References

[Minka00] TP Minka, "Beyond Newton's Method", unpublished note (2000).

[Minka02] TP Minka, "Estimating a Gamma distribution", unpublished note (2002).