Skip to content

Commit

Permalink
draft: Add further clarification for introductory notebooks (#3)
Browse files Browse the repository at this point in the history
* Add more explicit titles and give further clarification on model parameters
* Add more references to APIs
* Use integer observation counts for clarity
  • Loading branch information
matthewfeickert authored Apr 7, 2021
1 parent 8857f4d commit ef56a8f
Show file tree
Hide file tree
Showing 5 changed files with 37 additions and 28 deletions.
47 changes: 28 additions & 19 deletions book/HelloWorld.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# My First Likelihood\n",
"# Introduction to HistFactory Models\n",
"\n",
"🎶 I'm the very Model of a simple HEP-like measurement... 🎶\n",
"\n",
Expand All @@ -30,7 +30,7 @@
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pyhf\n",
"from pyhf.contrib.viz import brazil # not imported by default!"
"from pyhf.contrib.viz import brazil"
]
},
{
Expand All @@ -49,7 +49,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"What did we just make? This returns a [`pyhf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification."
"What did we just make? This returns a [`pyhf.pdf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification."
]
},
{
Expand Down Expand Up @@ -153,7 +153,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly verify by hand to convince ourselves of what's going on here:"
"Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly calculate \"by hand\" the auxiliary data to convince ourselves of what's going on here (remembering that the background uncertainties were 10% and 20% of the observed background counts):"
]
},
{
Expand All @@ -165,6 +165,13 @@
"(np.array([5.0, 12.0]) / np.array([50.0, 60.0])) ** -2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"which is what we see from the `pyhf.pdf.Model` API"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -223,7 +230,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This returns the data for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data."
"This returns the expected data given the model parameters for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data."
]
},
{
Expand Down Expand Up @@ -475,10 +482,10 @@
"source": [
"## Simple Inference\n",
"\n",
"The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the parameters with the data fixed.\n",
"The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the model parameters conditioned on the fixed observations.\n",
"\n",
"$$\n",
"\\log L(\\theta | x) = \\log p(x | \\theta)\n",
"\\log L(\\theta | x) \\propto \\log p(x | \\theta)\n",
"$$\n",
"\n",
"The value of the likelihood is a float. Let's try it for both the background-only model as well as the signal+background model."
Expand All @@ -490,7 +497,7 @@
"metadata": {},
"outputs": [],
"source": [
"observations = [52.5, 65.0] + model.config.auxdata # this is a common pattern!\n",
"observations = [53.0, 65.0] + model.config.auxdata # this is a common pattern!\n",
"\n",
"model.logpdf(pars=bkg_pars, data=observations)"
]
Expand All @@ -510,7 +517,7 @@
"source": [
"We're not performing inference just yet. We're simply computing the 'logpdf' of the model specified by the parameters $\\theta$ against the provided data. To perform a fit, we use the [inference API](https://pyhf.readthedocs.io/en/v0.6.1/api.html#inference) via `pyhf.infer`.\n",
"\n",
"To fit a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\". This is often referred to mathematically by\n",
"When fitting a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\" of the model parameters. This is often referred to mathematically by\n",
"\n",
"$$\n",
"\\hat{\\theta}_\\text{MLE} = \\text{argmax}_\\theta L(\\theta | x)\n",
Expand All @@ -537,8 +544,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]`, an observed count of `[52.5, 65]` suggests best fit values:\n",
"* $\\hat{\\mu} \\approx 0.5$,\n",
"So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]` model components, an observed count of `[53, 65]` suggests best fit values:\n",
"* $\\hat{\\mu} \\approx 0.54$,\n",
"* $\\hat{\\gamma} \\approx [1,1]$."
]
},
Expand Down Expand Up @@ -597,10 +604,10 @@
"* $\\hat{\\hat{\\theta}}$ is the best fitted value of the nuisance parameters (for fixed POIs)\n",
"* $\\hat{\\psi}$ and $\\hat{\\theta}$ are the best fitted values in a global fit\n",
"\n",
"So let's run a hypothesis test for\n",
"So let's run a limit setting (exclusion) hypothesis test for\n",
"\n",
"* null hypothesis ($\\mu = 1$) — \"SUSY is real\"\n",
"* alternate hypothesis ($\\mu = 0$) — \"Standard Model explains it all\""
"* null hypothesis ($\\mu = 1$) — \"BSM physics process exists\"\n",
"* alternate hypothesis ($\\mu = 0$) — \"Standard Model only physics\""
]
},
{
Expand All @@ -611,7 +618,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [52.5, 65.0] + model.config.auxdata,\n",
" [53.0, 65.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"q\",\n",
" return_expected_set=True,\n",
Expand Down Expand Up @@ -652,7 +659,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [52.5, 65.0] + model.config.auxdata,\n",
" [53.0, 65.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"qtilde\",\n",
" return_expected_set=True,\n",
Expand Down Expand Up @@ -691,7 +698,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):"
"We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):\n",
"\n",
"The horiztonal red line indicates the test size ($\\alpha=0.05$), whose intersection with the $\\text{CL}_\\text{s}$ lines visually represents the $(1-\\alpha)\\%$ CL limit on the POI."
]
},
{
Expand All @@ -711,7 +720,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that if you wnated to do all of this \"by hand\" you still could pretty easily with the `pyhf` APIs"
"Note that if you wanted to do all of this \"by hand\" you still could pretty easily. The `pyhf.infer.intervals.upperlimit` API just makes it easier."
]
},
{
Expand All @@ -732,7 +741,7 @@
" for poi_value in poi_values\n",
"]\n",
"\n",
"# Calculate upper limit\n",
"# Calculate upper limit through interpolation\n",
"observed = np.asarray([h[0] for h in results]).ravel()\n",
"expected = np.asarray([h[1][2] for h in results]).ravel()\n",
"print(f\"Upper limit (obs): μ = {np.interp(0.05, observed[::-1], poi_values[::-1]):.4f}\")\n",
Expand Down
2 changes: 1 addition & 1 deletion book/SerializationAndPatching.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As of this tutorial, ATLAS has [published 5 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n",
"As of this tutorial, ATLAS has [published 7 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n",
"\n",
"<p align=\"center\">\n",
"<a href=\"https://www.hepdata.net/record/ins1755298?version=3\"><img src=\"https://raw.githubusercontent.com/matthewfeickert/talk-SciPy-2020/e0c509cd0dfef98f5876071edd4c60aff9199a1b/figures/HEPData_likelihoods.png\"></a>\n",
Expand Down
2 changes: 1 addition & 1 deletion book/SimpleWorkspace.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Workspace World\n",
"# Introduction to Workspaces\n",
"\n",
"Similarly to the previous chapter, we're going to go up \"one level\" from models to workspaces."
]
Expand Down
12 changes: 6 additions & 6 deletions book/Toys.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Playing with Toys\n",
"\n",
" As of `v0.6.1`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation."
"As of `v0.6.0`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation."
]
},
{
Expand Down Expand Up @@ -47,7 +47,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [52.5, 65.0] + model.config.auxdata,\n",
" [53.0, 65.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"qtilde\",\n",
" return_expected_set=True,\n",
Expand All @@ -72,7 +72,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [52.5, 65.0] + model.config.auxdata,\n",
" [53.0, 65.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"qtilde\",\n",
" return_expected_set=True,\n",
Expand Down Expand Up @@ -126,7 +126,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [5.25, 6.5] + model.config.auxdata,\n",
" [5.0, 7.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"qtilde\",\n",
" return_expected_set=True,\n",
Expand All @@ -152,7 +152,7 @@
"source": [
"CLs_obs, CLs_exp = pyhf.infer.hypotest(\n",
" 1.0, # null hypothesis\n",
" [5.25, 6.5] + model.config.auxdata,\n",
" [5.0, 7.0] + model.config.auxdata,\n",
" model,\n",
" test_stat=\"qtilde\",\n",
" return_expected_set=True,\n",
Expand Down Expand Up @@ -188,7 +188,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.8.7"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion book/data/2-bin_1-channel.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
}
],
"observations": [
{ "name": "singlechannel", "data": [52.5, 65.0] }
{ "name": "singlechannel", "data": [53.0, 65.0] }
],
"measurements": [
{ "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
Expand Down

0 comments on commit ef56a8f

Please sign in to comment.