draft: Add further clarification for introductory notebooks (#3)

* Add more explicit titles and give further clarification on model parameters * Add more references to APIs * Use integer observation counts for clarity
pyhf · Apr 7, 2021 · ef56a8f · ef56a8f
1 parent 8857f4d
commit ef56a8f
Show file tree

Hide file tree

Showing 5 changed files with 37 additions and 28 deletions.
diff --git a/book/HelloWorld.ipynb b/book/HelloWorld.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# My First Likelihood\n",
+    "# Introduction to HistFactory Models\n",
     "\n",
     "🎶 I'm the very Model of a simple HEP-like measurement... 🎶\n",
     "\n",
@@ -30,7 +30,7 @@
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
     "import pyhf\n",
-    "from pyhf.contrib.viz import brazil  # not imported by default!"
+    "from pyhf.contrib.viz import brazil"
    ]
   },
   {
@@ -49,7 +49,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "What did we just make? This returns a [`pyhf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification."
+    "What did we just make? This returns a [`pyhf.pdf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification."
    ]
   },
   {
@@ -153,7 +153,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly verify by hand to convince ourselves of what's going on here:"
+    "Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly calculate \"by hand\" the auxiliary data to convince ourselves of what's going on here (remembering that the background uncertainties were 10% and 20% of the observed background counts):"
    ]
   },
   {
@@ -165,6 +165,13 @@
     "(np.array([5.0, 12.0]) / np.array([50.0, 60.0])) ** -2"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "which is what we see from the `pyhf.pdf.Model` API"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -223,7 +230,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This returns the data for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data."
+    "This returns the expected data given the model parameters for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data."
    ]
   },
   {
@@ -475,10 +482,10 @@
    "source": [
     "## Simple Inference\n",
     "\n",
-    "The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the parameters with the data fixed.\n",
+    "The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the model parameters conditioned on the fixed observations.\n",
     "\n",
     "$$\n",
-    "\\log L(\\theta | x) = \\log p(x | \\theta)\n",
+    "\\log L(\\theta | x) \\propto \\log p(x | \\theta)\n",
     "$$\n",
     "\n",
     "The value of the likelihood is a float. Let's try it for both the background-only model as well as the signal+background model."
@@ -490,7 +497,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "observations = [52.5, 65.0] + model.config.auxdata  # this is a common pattern!\n",
+    "observations = [53.0, 65.0] + model.config.auxdata  # this is a common pattern!\n",
     "\n",
     "model.logpdf(pars=bkg_pars, data=observations)"
    ]
@@ -510,7 +517,7 @@
    "source": [
     "We're not performing inference just yet. We're simply computing the 'logpdf' of the model specified by the parameters $\\theta$ against the provided data. To perform a fit, we use the [inference API](https://pyhf.readthedocs.io/en/v0.6.1/api.html#inference) via `pyhf.infer`.\n",
     "\n",
-    "To fit a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\". This is often referred to mathematically by\n",
+    "When fitting a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\" of the model parameters. This is often referred to mathematically by\n",
     "\n",
     "$$\n",
     "\\hat{\\theta}_\\text{MLE} = \\text{argmax}_\\theta L(\\theta | x)\n",
@@ -537,8 +544,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]`, an observed count of `[52.5, 65]` suggests best fit values:\n",
-    "* $\\hat{\\mu} \\approx 0.5$,\n",
+    "So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]` model components, an observed count of `[53, 65]` suggests best fit values:\n",
+    "* $\\hat{\\mu} \\approx 0.54$,\n",
     "* $\\hat{\\gamma} \\approx [1,1]$."
    ]
   },
@@ -597,10 +604,10 @@
     "* $\\hat{\\hat{\\theta}}$ is the best fitted value of the nuisance parameters (for fixed POIs)\n",
     "* $\\hat{\\psi}$ and $\\hat{\\theta}$ are the best fitted values in a global fit\n",
     "\n",
-    "So let's run a hypothesis test for\n",
+    "So let's run a limit setting (exclusion) hypothesis test for\n",
     "\n",
-    "* null hypothesis ($\\mu = 1$) &mdash; \"SUSY is real\"\n",
-    "* alternate hypothesis ($\\mu = 0$) &mdash; \"Standard Model explains it all\""
+    "* null hypothesis ($\\mu = 1$) &mdash; \"BSM physics process exists\"\n",
+    "* alternate hypothesis ($\\mu = 0$) &mdash; \"Standard Model only physics\""
    ]
   },
   {
@@ -611,7 +618,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [52.5, 65.0] + model.config.auxdata,\n",
+    "    [53.0, 65.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"q\",\n",
     "    return_expected_set=True,\n",
@@ -652,7 +659,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [52.5, 65.0] + model.config.auxdata,\n",
+    "    [53.0, 65.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"qtilde\",\n",
     "    return_expected_set=True,\n",
@@ -691,7 +698,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):"
+    "We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):\n",
+    "\n",
+    "The horiztonal red line indicates the test size ($\\alpha=0.05$), whose intersection with the $\\text{CL}_\\text{s}$ lines visually represents the $(1-\\alpha)\\%$ CL limit on the POI."
    ]
   },
   {
@@ -711,7 +720,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Note that if you wnated to do all of this \"by hand\" you still could pretty easily with the `pyhf` APIs"
+    "Note that if you wanted to do all of this \"by hand\" you still could pretty easily. The `pyhf.infer.intervals.upperlimit` API just makes it easier."
    ]
   },
   {
@@ -732,7 +741,7 @@
     "    for poi_value in poi_values\n",
     "]\n",
     "\n",
-    "# Calculate upper limit\n",
+    "# Calculate upper limit through interpolation\n",
     "observed = np.asarray([h[0] for h in results]).ravel()\n",
     "expected = np.asarray([h[1][2] for h in results]).ravel()\n",
     "print(f\"Upper limit (obs): μ = {np.interp(0.05, observed[::-1], poi_values[::-1]):.4f}\")\n",

diff --git a/book/SerializationAndPatching.ipynb b/book/SerializationAndPatching.ipynb
@@ -30,7 +30,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As of this tutorial, ATLAS has [published 5 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n",
+    "As of this tutorial, ATLAS has [published 7 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n",
     "\n",
     "<p align=\"center\">\n",
     "<a href=\"https://www.hepdata.net/record/ins1755298?version=3\"><img src=\"https://raw.githubusercontent.com/matthewfeickert/talk-SciPy-2020/e0c509cd0dfef98f5876071edd4c60aff9199a1b/figures/HEPData_likelihoods.png\"></a>\n",

diff --git a/book/SimpleWorkspace.ipynb b/book/SimpleWorkspace.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Workspace World\n",
+    "# Introduction to Workspaces\n",
     "\n",
     "Similarly to the previous chapter, we're going to go up \"one level\" from models to workspaces."
    ]

diff --git a/book/Toys.ipynb b/book/Toys.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Playing with Toys\n",
     "\n",
-    "  As of `v0.6.1`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation."
+    "As of `v0.6.0`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation."
    ]
   },
   {
@@ -47,7 +47,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [52.5, 65.0] + model.config.auxdata,\n",
+    "    [53.0, 65.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"qtilde\",\n",
     "    return_expected_set=True,\n",
@@ -72,7 +72,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [52.5, 65.0] + model.config.auxdata,\n",
+    "    [53.0, 65.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"qtilde\",\n",
     "    return_expected_set=True,\n",
@@ -126,7 +126,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [5.25, 6.5] + model.config.auxdata,\n",
+    "    [5.0, 7.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"qtilde\",\n",
     "    return_expected_set=True,\n",
@@ -152,7 +152,7 @@
    "source": [
     "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
     "    1.0,  # null hypothesis\n",
-    "    [5.25, 6.5] + model.config.auxdata,\n",
+    "    [5.0, 7.0] + model.config.auxdata,\n",
     "    model,\n",
     "    test_stat=\"qtilde\",\n",
     "    return_expected_set=True,\n",
@@ -188,7 +188,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.6"
+   "version": "3.8.7"
   }
  },
  "nbformat": 4,

diff --git a/book/data/2-bin_1-channel.json b/book/data/2-bin_1-channel.json
@@ -14,7 +14,7 @@
         }
     ],
     "observations": [
-        { "name": "singlechannel", "data": [52.5, 65.0] }
+        { "name": "singlechannel", "data": [53.0, 65.0] }
     ],
     "measurements": [
         { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }