This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
▶ Text on GitHub with a CC-BY-NC-ND license
▶ Code on GitHub with a MIT license
Chapter 7 : Statistical Data Analysis
In the last recipe, we used a frequentist method to test a hypothesis on incomplete data. Here, we will see an alternative approach based on Bayesian theory. The main idea is to consider that unknown parameters are random variables, just like the variables describing the experiment. Prior knowledge about the parameters is integrated into the model. This knowledge is updated as more and more data is observed.
Frequentists and Bayesians interpret probabilities differently. Frequentists interpret a probability as a limit of frequencies when the number of samples tends to infinity. Bayesians interpret it as a belief; this belief is updated as more and more data is observed.
Here, we revisit the previous coin flipping example with a Bayesian approach. This example is sufficiently simple to permit an analytical treatment. In general, as we will see later in this chapter, analytical results cannot be obtained and numerical methods become essential.
This is a math-heavy recipe. Knowledge of basic probability theory (random variables, distributions, Bayes formula) and calculus (derivatives, integrals) is recommended. We use the same notations as in the previous recipe.
Let
- First, we assume that
$q$ is a uniform random variable on the interval$[0, 1]$ . That's our prior distribution: for all$q$ ,$P(q)=1$ . - Then, we flip our coin
$n$ times. We note$x_i$ the outcome of the $i$th flip ($0$ for tail,$1$ for head). - What is the probability distribution of
$q$ knowing the observations$x_i$ ? Bayes' theorem allows us to compute the posterior distribution analytically (see the next section for the mathematical details):
- We define the posterior distribution according to the mathematical formula above. We remark that this expression is
$(n+1)$ times the probability mass function (PMF) of the binomial distribution, which is directly available inscipy.stats
. (For more information on Binomial distribution, refer to https://en.wikipedia.org/wiki/Binomial_distribution.)
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
%matplotlib inline
def posterior(n, h, q):
return (n + 1) * st.binom(n, q).pmf(h)
- Let's plot this distribution for an observation of
$h=61$ heads and$n=100$ total flips:
n = 100
h = 61
q = np.linspace(0., 1., 1000)
d = posterior(n, h, q)
fig, ax = plt.subplots(1, 1)
ax.plot(q, d, '-k')
ax.set_xlabel('q parameter')
ax.set_ylabel('Posterior distribution')
ax.set_ylim(0, d.max() + 1)
This curve represents our belief about the parameter q after we have observed 61 heads.
In this section, we explain Bayes' theorem, and we give the mathematical details underlying this example.
There is a very general idea in data science that consists of explaining data with a mathematical model. This is formalized with a one-way process, model → data.
Once this process is formalized, the task of the data scientist is to exploit the data to recover information about the model. In other words, we want to invert the original process and get data → model.
In a probabilistic setting, the direct process is represented as a conditional probability distribution
Similarly, the inverse process is
Bayes' theorem is at the core of a general framework for inverting a probabilistic process of model → data. It can be stated as follows:
This equation gives us information about our model, knowing the observed data. Bayes' equation is widely used in signal processing, statistics, machine learning, inverse problems, and in many other scientific applications.
In Bayes' equation,
In conclusion, Bayes' equation gives us a general roadmap for data inference:
- Specify a mathematical model for the direct process model → data (the
$P(data \mid model)$ term). - Specify a prior probability distribution for the model (
$P(model)$ term). - Perform analytical or numerical calculations to solve this equation.
In this recipe's example, we found the posterior distribution with the following equation (deriving directly from Bayes' theorem):
Knowing that the
In addition, we can compute analytically the following integral (using an integration by parts and an induction):
Finally, we get:
We can get a point estimate from the posterior distribution. For example, the maximum a posteriori (MAP) estimation consists of considering the maximum of the posterior distribution as an estimate for
Here, we can get this estimate analytically by deriving the posterior distribution with respect to
This expression is equal to zero when
In this recipe, we showed a few basic notions in Bayesian theory. We illustrated them with a simple example. The fact that we were able to derive the posterior distribution analytically is not very common in real-world applications. This example is nevertheless informative because it explains the core mathematical ideas behind the complex numerical methods we will see later.
The posterior distribution indicates the plausible values for
In this recipe, the prior and posterior distributions are conjugate, meaning that they belong to the same family (the beta distribution). For this reason, we were able to compute the posterior distribution analytically. You will find more details about conjugate distributions at https://en.wikipedia.org/wiki/Conjugate_prior.
We chose a uniform distribution as prior distribution for the unknown parameter
- The Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method recipe