Translate analysis examples into new specifications, provide feedback, iterating as necessary #10

BenGalewsky · 2019-09-16T19:52:48Z

We take the analysis examples documented in #3, and start re-implementing them in our new specifications. We aim to identify limitations and issues throughout this process, and iterate on them. New implementations are guided by the requirements of the user stories developed in #1.

Assumptions

Examples of analysis use cases have been documented as part of Collect and curate example analysis use cases with some existing reference implementation #3.

Acceptance criteria

Point to implementations (eg. as a GitHub repo) and feedback (eg. as talks, issues, etc.)

cranmer · 2019-10-30T17:38:18Z

Z-peak
- @jpivarski please point to for Coffea style Z-peak
- @gordonwatts etc. for func adl style for Z-peak
  https://github.com/gordonwatts/func_adl_electrons/blob/master/ElectronData.ipynb
Higgs with CMS open data
- point to code used by @lukasheinrich in kubecon demo https://github.com/lukasheinrich/higgs-demo
- Blog post on it: http://cylindricalonion.web.cern.ch/blog/201906/finding-higgs-boson-within-minutes-cloud
Template based fit
- @alexander-held and @matthewfeickert need to start working and link to template based analysis using Coffea etc.
- https://github.com/alexander-held/template_fit_workflows
Reinterpretation example:
point to:
likelihood publishing: ATLAS sbottom likelihoods have been published to HEPData following the result of an ATLAS PUB note documenting the serialization procedure
recast link to analyses in ATLAS private GitLab
recasting demo

cranmer · 2019-10-30T17:50:59Z

In terms of which analysis we use for the template fit, we could also use ATLAS multi-b which satisfies the reinterpretation example.

alexander-held · 2019-11-26T16:43:39Z

Template based fit

The broad scope is to efficiently produce template histograms, post-process them and build a workspace. The workspace is then used for inference, and interfaces to visualization tools exist to implement the user stories described in #1. Instead of monolithic frameworks, we envision a modular approach with well-defined interfaces.

Production of template distributions

The FAST framework seems to be a good fit for the production of input template histograms. It is a declarative framework, which can provide the power of coffea without the user having to write dedicated code. I was in contact with a developer at CHEP2019, and they have already used FAST to produce inputs to the CMS Combine framework for statistical analysis. Two main points were identified that require further investigation:

How to avoid duplicate information? The configuration files for FAST and those to build a workspace from the templates produced share a significant amount of information. One possible solution here is to design a configuration specification that extends the FAST format.
How to best extend the declarative approach with user-defined functions? With a well-defined interface, allowing for custom functionality defined by the user will significantly simplify non-standard workflows. It is difficult to foresee what kind of functionality might be needed. The ATLAS CAF framework seems to be a good example of achieving this.

The likelihood function

pyhf is the natural choice here, already shown to reproduce ROOT-based results in ATL-PHYS-PUB-2019-029 , see also CHEP2019 talk on pyhf from @matthewfeickert.
One point of feedback identified is related to the functionality of providing expressions for normalization factors, supported by RooFit. An example RooFit workspace using this feature was built from inputs in alexander-held/template_fit_workflows, to iterate on how to approach this with pyhf. More workflow examples are detailed in a comment below.

matthewfeickert · 2019-11-27T18:35:22Z

In terms of which analysis we use for the template fit, we could also use ATLAS multi-b which satisfies the reinterpretation example.

As this is something that would be useful to have in addition to @alexander-held's thesis analysis (described above) it would be good to work with @kratsg on how to move forward for the multi-b. In addition to ongoing chats, @matthewfeickert and @kratsg will both be at US ATLAS Hadronic Final State Forum 2019 which offers a natural place to work on this.

matthewfeickert · 2019-11-27T19:24:44Z

Related to the visualization work and user stories that @alexander-held has been doing, there is also ongoing work in pyhf to add example plotting code and also establish (something along the lines of) pyhf.contrib.plotting for things like pull plots and ranking plots.

alexander-held · 2019-12-01T16:20:57Z

Template based fit

The repository alexander-held/template_fit_workflows illustrates three different approaches to the template fit workflow:

"traditional approach" fully based on TRExFitter for template histogram production, workspace production and inference steering,
parsing the xml workspace provided by TRExFitter with pyhf for subsequent inference within pyhf,
fully python-based approach, building template histograms with FAST-HEP, constructing a workspace from them and inference with pyhf

The three approaches yield consistent results, and the repository contains an example for all three. This small implemented example is an important step towards more complex models. It led to the identification of several points that are now being followed up upon, via discussions in the IRIS-HEP Slack, the FAST-HEP gitter channel, and github issues. Relevant issues include requests for the creation of an extended example for parsing xmls with pyhf, support for parsing normalization factors from xmls, and pruning of nuisance parameters for model statistical uncertainties. For FAST-HEP, they include wildcard support for trees in ntuples and support for a different way to write information to YAML files.

Besides these points listed above, another point identified by this investigation is that very related information for configuration purposes is spread across multiple places. It would be easier to specify it in one central place. More specifically, the configuration of FAST-HEP requires information about which types of template histograms are needed and how to build them, while the subsequent workspace construction needs information about the relation between such histograms. One of the next steps is the adoption of a central configuration file, possibly similar to this TRExFitter example. Another similar example is @lukasheinrich's YAML based pyhfinput.

cranmer · 2020-02-16T21:17:42Z

CMS Higgs demo done with Kubernetes
https://github.com/lukasheinrich/higgs-demo

BenGalewsky added the AS Analysis Systems label Sep 16, 2019

BenGalewsky added this to the Y2Q1 milestone Sep 16, 2019

alexander-held self-assigned this Oct 30, 2019

matthewfeickert assigned matthewfeickert, gordonwatts and jpivarski Oct 30, 2019

This was referenced Nov 26, 2019

Benchmarking and assessment of existing analysis systems #12

Open

Benchmarking and assessment of prototype analysis system components #17

Open

gordonwatts removed their assignment Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate analysis examples into new specifications, provide feedback, iterating as necessary #10

Translate analysis examples into new specifications, provide feedback, iterating as necessary #10

BenGalewsky commented Sep 16, 2019 •

edited by cranmer

Loading

cranmer commented Oct 30, 2019 •

edited

Loading

cranmer commented Oct 30, 2019 •

edited by matthewfeickert

Loading

alexander-held commented Nov 26, 2019 •

edited

Loading

matthewfeickert commented Nov 27, 2019 •

edited

Loading

matthewfeickert commented Nov 27, 2019

alexander-held commented Dec 1, 2019

cranmer commented Feb 16, 2020

Translate analysis examples into new specifications, provide feedback, iterating as necessary #10

Translate analysis examples into new specifications, provide feedback, iterating as necessary #10

Comments

BenGalewsky commented Sep 16, 2019 • edited by cranmer Loading

Assumptions

Acceptance criteria

cranmer commented Oct 30, 2019 • edited Loading

cranmer commented Oct 30, 2019 • edited by matthewfeickert Loading

alexander-held commented Nov 26, 2019 • edited Loading

Template based fit

Production of template distributions

The likelihood function

matthewfeickert commented Nov 27, 2019 • edited Loading

matthewfeickert commented Nov 27, 2019

alexander-held commented Dec 1, 2019

Template based fit

cranmer commented Feb 16, 2020

BenGalewsky commented Sep 16, 2019 •

edited by cranmer

Loading

cranmer commented Oct 30, 2019 •

edited

Loading

cranmer commented Oct 30, 2019 •

edited by matthewfeickert

Loading

alexander-held commented Nov 26, 2019 •

edited

Loading

matthewfeickert commented Nov 27, 2019 •

edited

Loading