diff --git a/sheet11/sheet11.ipynb b/sheet11/sheet11.ipynb
index 351fe98..877bdaa 100644
--- a/sheet11/sheet11.ipynb
+++ b/sheet11/sheet11.ipynb
@@ -540,7 +540,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.1"
+   "version": "3.5.2"
   }
  },
  "nbformat": 4,
diff --git a/sheet11/sheet11solutions.ipynb b/sheet11/sheet11solutions.ipynb
index be132dc..b476182 100644
--- a/sheet11/sheet11solutions.ipynb
+++ b/sheet11/sheet11solutions.ipynb
@@ -91,7 +91,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The hypothesis space for Candidate-Elimination spreads between the most general and most specific hypotheses. The other hypotheses are made up by conjunction of features which biases the learner and makes it impossible to find a disjunctive solution."
+    "The hypothesis space for Candidate-Elimination spreads between the most general and most specific hypotheses. The other hypotheses are made up by conjunction of features which biases the learner and makes it impossible to find a disjunctive solution.\n",
+    "\n",
+    "The version space on the other hand is a subset of the hypotheses space. It is the set of all hypotheses between and including the general and the specific boundary.\n",
+    "\n"
    ]
   },
   {
@@ -389,9 +392,9 @@
     "Which of the following formulae describes the backpropagation of the error through hidden layers in a Multilayer Perceptron?\n",
     "Assume they are calculated for each $k=L_H \\dots 1$ and $i=1\\dots N(k)$.\n",
     "\n",
-    "1. $\\delta_i(k) = f^\\prime(o_i(k-1)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k+1, k)o_i(k)$\n",
-    "2. $\\delta_i(k) = f^\\prime(o_i(k-1)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k+1, k)\\delta_i(k+1)$\n",
-    "3. $\\delta_i(k) = f^\\prime(o_i(k-1)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k, k-1)\\delta_i(k+1)$"
+    "1. $\\delta_i(k) = f^\\prime(o_i(k)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k+1, k)o_i(k)$\n",
+    "2. $\\delta_i(k) = f^\\prime(o_i(k)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k+1, k)\\delta_i(k+1)$\n",
+    "3. $\\delta_i(k) = f^\\prime(o_i(k)) \\sum\\limits_{j=1}^{N(k+1)} w_{ji}(k, k-1)\\delta_i(k+1)$"
    ]
   },
   {
@@ -584,7 +587,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The (first-order) Markov assumption means that state $s_{t+1}$ only depends on its predecessor state $s_t$ and the action $a_t$ performed then, i.e.: $s_{t+1} = \\delta(s_t, a_t)$. This allows to specify a $Q$-function of the form $Q(s_t,a_t)$, instead of $Q(s_0,a_0,\\ldots,s_t,a_t)$. The Markov assumption does not hold in situations where, e.g. the state does contain full information."
+    "The (first-order) Markov assumption means that state $s_{t+1}$ only depends on its predecessor state $s_t$ and the action $a_t$ performed then, i.e.: $s_{t+1} = \\delta(s_t, a_t)$. This allows to specify a $Q$-function of the form $Q(s_t,a_t)$, instead of $Q(s_0,a_0,\\ldots,s_t,a_t)$. The Markov assumption does not hold in situations where more information is needed than provided by the previous state. For example for sentence parsing with each word being a state the Markov assumption does not hold."
    ]
   },
   {
@@ -693,7 +696,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.1"
+   "version": "3.5.2"
   }
  },
  "nbformat": 4,