You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The distribution starts at $\mu$ (support is $x > \mu$). We typically
know $\mu$. It is often 0; or in the case of fitting a gamma density
to a right tail, we know the threshold $\mu$ at which we truncated the
tail.
The cumulative distribution function (CDF) does not have an analytical
expression. It is calculated numerically using the incomplete Gamma
function (esl_stats_IncompleteGamma()).
Some sources will say $x \geq \mu$, but $P(x=\mu) = 0$. Easel code
makes sure $x > \mu$ to avoid $\log P$ terms of $-\infty$.
Maximum likelihood parameter estimation
Complete data; known $\mu$
Given a complete dataset of $n$ observed samples $x_i$ ($i=1..n$) and
known $\mu$, esl_gam_FitComplete() estimates maximum likelihood
parameters $\hat{\tau}$ and $\hat{\lambda}$ using a generalized Newton
optimization [Minka00,Minka02].
The optimization only needs two sufficient statistics, the means
$\bar{x} = \frac{1}{n} \sum_i (x_i - \mu)$ and
$\overline{\log x} = \frac{1}{n} \sum_i \log (x_i - \mu)$,
which we precalculate.
The first and second derivatives of $\log \Gamma(x)$ are called
$\Psi(x)$ and $\Psi'(x)$, the "digamma" and "trigamma" functions.
These are obtained numerically by esl_stats_Psi() and
esl_stats_Trigamma().
See the appendix at the end for the derivation of Minka's method.
Appendix 1: Minka's generalized Newton method
Minka's generalized Newton method works as follows. We aim to maximize
the log likelihood of the data:
It's equivalent to maximize the average log likelihood, which we can
write in terms of two sufficient statistics, the means $\bar{x} = \frac{1}{n} \sum_i (x_i - \mu)$ and
$\overline{\log x} = \frac{1}{n} \sum_i \log (x_i - \mu)$:
This makes sense because the mean of a Gamma is $\frac{\tau}{\lambda} + \mu$.
Substitute $\hat{\lambda} = \frac{\tau}{\bar{x}}$ back into the
average log likelihood to get an objective function $f(\tau)$ in
terms of a single variable $\tau$:
Newton's method for finding the optimum of $f(x)$ works iteratively,
by approximating $f(x)$ locally around a current point $x_t$ by fitting a Gaussian distribution $g(x)$
to match $f$ and its first and second derivatives (i.e. $f(x_t) = g(x_t)$, $f'(x_t) = g'(x_t)$, $f''(x_t) = g''(x_t)$),
then analytically locating the optimum of $g(x)$ to propose the next point $x_{t+1}$.
Minka generalizes Newton's method by observing that we may be able to
find a much better approximation $g(x)$ for our particular
$f(x)$. Here, he proposes the approximation:
That's our iterative reestimation, but we still need to choose an initial starting point $\tau_0$. Minka observes
that $\Psi(\tau) \approx \log(\tau) - \frac{1}{2\tau}$ (from a Stirling approximation
$\log \Gamma(\tau) \approx \tau \log \tau - \tau - \frac{1}{2} \tau + \mathrm{const.}$), which
we can substitute into $f'(\tau)$, set to zero, and solve: