From fe05330f9cb48503b7e2d2d8bfff2c88fb63cabe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Victor=20F=C3=A9rat?= <victor.ferat@live.Fr>
Date: Wed, 31 Jan 2024 15:06:01 +0100
Subject: [PATCH] Update 10_entropy.py

---
 tutorials/segmentation/10_entropy.py | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tutorials/segmentation/10_entropy.py b/tutorials/segmentation/10_entropy.py
index 4fa0d9d9..1b7d834b 100644
--- a/tutorials/segmentation/10_entropy.py
+++ b/tutorials/segmentation/10_entropy.py
@@ -102,8 +102,8 @@
 # Shannon entropy
 # ---------------
 # The Shannon entropy \ :footcite:t:`shannon1948mathematical` of the microstate sequence describes how flat the microstate class distribution is. The two extremes are:
-# 1. A flat distribution. In this example, the maximum entropy would be observed if each microstate class (A, B, C, D, F) had probability $p=1/5$. The resulting Shannon entropy would be h=log(5)=2.32 bits.
-# 2. A peaked distribution. If any microstate class occurs with probability $p=1$, and all other classes with probability $p=0$, the resulting Shannon entropy would achieve its minimum value of h=0 bits.  
+# #. A flat distribution. In this example, the maximum entropy would be observed if each microstate class (A, B, C, D, F) had probability $p=1/5$. The resulting Shannon entropy would be :math:`h=log(5)=2.32 bits`.
+# #. A peaked distribution. If any microstate class occurs with probability :math:`p=1`, and all other classes with probability :math:`p=0`, the resulting Shannon entropy would achieve its minimum value of :math:`h=0` bits.  
 # 
 # In the example below, we observe that smoothing leads to a slight entropy reduction. 
 
@@ -145,10 +145,10 @@
 
 #%% [markdown]
 # We can now test how microstate sequence (Kolmogorov) complexity changes with pre-processing:
-# 1. no smoothing, full microstate sequence (duplicates not removed)
-# 2. smoothing, full microstate sequence (duplicates not removed)
-# 3. no smoothing, microstate jump sequence (duplicates removed)
-# 4. smoothing, microstate jump sequence (duplicates removed)
+# #. no smoothing, full microstate sequence (duplicates not removed)
+# #. smoothing, full microstate sequence (duplicates not removed)
+# #. no smoothing, microstate jump sequence (duplicates removed)
+# #. smoothing, microstate jump sequence (duplicates removed)
 # 
 # Smoothing makes microstate sequences more predictable (less complex), removing duplicates makes sequences less predictable (more complex).
 
@@ -168,7 +168,7 @@
 # Autoinformation function
 # ------------------------
 # The autoinformation function (AIF) is the information-theoretic analogy to the autocorrelation function (ACF) for numerical time series.  
-# The autoinformation coefficient at time lag $k$ is the information shared between microstate labels $k$ time samples apart. Mathematically, it is computed as the mutual information between the microstate label $X_t$ at time $t$, and the label $X_{t+k}$ at $t+k$, averaged across the whole sequence: $H(X_{t+k}) - H(X_{t+k} \vert X_{t})$.
+# The autoinformation coefficient at time lag `k` is the information shared between microstate labels ``k`` time samples apart. Mathematically, it is computed as the mutual information between the microstate label :math:`X_t` at time :math:`t``, and the label :math:`X_{t+k} at :math:`t+k`, averaged across the whole sequence: :math:`H(X_{t+k}) - H(X_{t+k} \vert X_{t})`.
 # 
 # Below, we compare the AIF of microstate sequences with and without smoothing. Smoothing increases overall temporal dependencies and removes microstate oscillations (AIF peaks at 50, 100, 150 ms) that are visible in the minimally pre-processed sequence.
 
@@ -199,8 +199,8 @@
 # Partial autoinformation
 # -----------------------
 # Partial autoinformation (PAI) describes the dependence between microstate sequence labels $k$ samples apart, removing the influence of all intermediate labels. The autoinformation function does not account for the effect of intermediate time steps.  
-# PAI is computationally more expensive and it is recommended to start with a low number of lags (e.g. 5).  
-# PAI coefficients can identify (first-order) Markov processes as their PAI coefficients are zero for lags $k \ge 2$. 
+# PAI is computationally more expensive and it is recommended to start with a low number of lags (e.g. ``5``).  
+# PAI coefficients can identify (first-order) Markov processes as their PAI coefficients are zero for lags :math:`k \ge 2`. 
 # 
 # Below, we compare the PAI coefficients of microstate sequences with and without smoothing.
 # It is observed that smoothing shifts temporal dependencies from towards the first time lag, i.e. renders the sequences more Markovian.