Skip to content

configurationdatabase

afestant edited this page Aug 30, 2019 · 6 revisions

Intro

In this tutorial you will learn the meaning of the most important parameters of the MLHEP package.

LcpK0spp:
  mass: 2.2864
  sel_reco_unp: "pt_cand>1"
  sel_reco_singletrac_unp : null
  sel_gen_unp: "pt_cand>1 and abs(z_vtx_gen)<10"
  sel_cen_unp: null
  #sel_good_evt_unp: "is_ev_rej == 0"
  sel_good_evt_unp: null
  sel_reco_skim: ["pt_prong0>0.25 and pt_prong1>0.30 and pt_prong2>0.30 and cos_p_K0s>0.999 and abs(nsigTPC_Pr_0)<3.0",
                  "pt_prong0>0.25 and pt_prong1>0.30 and pt_prong2>0.30 and cos_p_K0s>0.999 and abs(nsigTPC_Pr_0)<3.0",
                  "pt_prong0>0.25 and pt_prong1>0.30 and pt_prong2>0.30 and cos_p_K0s>0.999 and abs(nsigTPC_Pr_0)<3.0",
                  "pt_prong0>0.25 and pt_prong1>0.30 and pt_prong2>0.30 and cos_p_K0s>0.999 and abs(nsigTPC_Pr_0)<3.0",
                  "pt_prong0>0.25 and pt_prong1>0.30 and pt_prong2>0.30 and cos_p_K0s>0.999 and abs(nsigTPC_Pr_0)<3.0"]
  sel_gen_skim: [null,null,null,null,null]
  sel_skim_binmin: [1,2,4,8,12] #list of nbins
  sel_skim_binmax: [2,4,8,12,24] #list of nbins
  var_binning: pt_cand
  dofullevtmerge: false

The first part of the database includes the parameters needed to perform the conversion and skimming step. In particular:

  • sel_reco_unp: is the selection applied at the conversion stage on the reco candidates
  • sel_reco_singletrac_unp: option to apply single track selection at the unpacking level. In most of the cases this selection is already preapplied at the level of the TTree creation
  • sel_gen_unp: is the selection applied at the conversion stage on the gen candidates
  • sel_cen_unp: centrality selection applied at the conversion stage
  • sel_reco_skim: is the selection applied at the skimming stage on the reco candidates
  • sel_gen_skim: is the selection applied at the skimming stage on the gen candidates
  • sel_skim_binmin, sel_skim_binmax: ranges used to bin the converted dataframes in skimmed dataframes. At the skimming level indeed the pandas dataframe is splitted into several subdataframes according to the value of a given variable. This is typically done in bins of pT if you are performing an analysis vs pt.
  • var_binning: here you define the variable name used for splitting the datasets (e.g. pt_cand or multiplicity)
  bitmap_sel:
    use: True
    var_name: cand_type
    var_isstd: isstd
    var_ismcsignal: ismcsignal
    var_ismcprompt: ismcprompt
    var_ismcfd: ismcfd
    var_ismcbkg: ismcbkg
    isstd : [[0],[]]
    ismcsignal: [[1],[5]]
    ismcprompt: [[1,3],[5]]
    ismcfd: [[1,4],[5]]
    ismcbkg: [[2],[]]

Selections via bitmap cand_type associated to each candidate when filling the trees on the Grid.

  • var_isstd: isstd: candidate selected by standard analysis cuts (subsample of the candidates stored in the tree)
  • var_ismcsignal: ismcsignal: MC true signal candidates
  • var_ismcprompt: ismcprompt: MC true prompt signal candidates
  • var_ismcfd: ismcfd: MC true feed-down signal candidates
  • var_ismcbkg: ismcbkg: MC background candidates
  • isstd : [[0],[]]: candidate selected by standard analysis cuts (subsample of the candidates stored in the tree). Selection performed checking the single bits of "cand_type"
  • ismcsignal: [[1],[5]]: MC true signal candidates. Selection performed checking the single bits of "cand_type"
  • ismcprompt: [[1,3],[5]]: MC true prompt signal candidates. Selection performed checking the single bits of "cand_type"
  • ismcfd: [[1,4],[5]]: MC true feed-down signal candidates. Selection performed checking the single bits of "cand_type"
  • ismcbkg: [[2],[]]: MC background candidates. Selection performed checking the single bits of "cand_type"
  variables:
    var_all: [cos_t_star, dca_K0s, signd0, imp_par_K0s, d_len_K0s, armenteros_K0s, ctau_K0s,
              cos_p_K0s, pt_prong0, pt_prong1, pt_prong2, imp_par_prong0, imp_par_prong1, imp_par_prong2,
              inv_mass, pt_cand, phi_cand, eta_cand, inv_mass_K0s, pt_K0s, cand_type, y_cand,
              run_number, ev_id, nsigTPC_Pr_0, nsigTOF_Pr_0,
              spdhits_prong0, spdhits_prong1, spdhits_prong2,
              pt_jet, eta_jet, phi_jet, delta_eta_jet, delta_phi_jet, delta_r_jet,
              pt_gen_jet, eta_gen_jet, phi_gen_jet, delta_eta_gen_jet, delta_phi_gen_jet, delta_r_gen_jet, pt_gen_cand]
    var_evt:
      data: [centrality, z_vtx_reco, n_vtx_contributors, n_tracks, is_ev_rej, run_number,
              ev_id, n_tracklets,V0Amult, trigger_hasbit_INT7, trigger_hasbit_HighMultSPD,
              trigger_hasbit_HighMultV0, trigger_hasclass_INT7, trigger_hasclass_HighMultSPD,
              trigger_hasclass_HighMultV0, n_tracklets_corr, v0m, v0m_eq, v0m_corr, v0m_eq_corr]
      mc: [z_vtx_gen, centrality, z_vtx_reco, n_vtx_contributors, n_tracks, is_ev_rej, run_number,
              ev_id, n_tracklets, V0Amult, trigger_hasbit_INT7, trigger_hasbit_HighMultSPD,
              trigger_hasbit_HighMultV0, trigger_hasclass_INT7, trigger_hasclass_HighMultSPD,
              trigger_hasclass_HighMultV0, n_tracklets_corr, v0m, v0m_eq, v0m_corr, v0m_eq_corr,
              mult_gen, mult_gen_v0a, mult_gen_v0c]
    var_gen: [y_cand, pt_cand, eta_cand, phi_cand, cand_type, pt_jet, eta_jet, phi_jet, delta_eta_jet, delta_phi_jet, delta_r_jet, run_number, ev_id]
    var_evt_match: [run_number, ev_id]
    var_training: [cos_t_star, signd0, dca_K0s, imp_par_K0s, d_len_K0s, armenteros_K0s, ctau_K0s, cos_p_K0s,  
                   imp_par_prong0, imp_par_prong1, imp_par_prong2, inv_mass_K0s, nsigTOF_Pr_0, nsigTPC_Pr_0] 
    var_boundaries: [cos_t_star, pt_cand]
    var_correlation:
      - [cos_t_star]
      - [pt_cand]
    var_signal: signal
    var_inv_mass: inv_mass
    var_cuts: 
        - [pt_prong0, lt, null]
        - [pt_prong1, lt, null]
        - [pt_prong2, lt, null]
        - [inv_mass_K0s, absst, 0.4977]
        - [nsigTPC_Pr_0, absst, 0.]
        - [nsigTOF_Pr_0, absst, 0.]
        - [imp_par_prong0, absst, 0.]
        - [cos_p_K0s, lt, null]
        - [armenteros_K0s, st, null]
        - [imp_par_K0s, absst, 0.]
        - [dca_K0s, absst, 0.]
        - [signd0, lt, null]
        - [cos_t_star, st, null]

In this block of the frames you define:

  • var_all: list of variables you want to extract from the ROOT TTree and include in the Pandas dataframe of the reco candidates
  • var_gen: list of variables you want to extract from the ROOT TTree and include in the Pandas dataframe of the gen candidates
  • var_evt: list of variables you want to extract from the ROOT TTree and include in the Pandas dataframe of the event
  • var_evt_match: variables used to match candidates to events from where they come from
  • var_training: list of training variables. ML optimization will consider this list of variables
  • var_boundaries: list of variables for decision boundary studies
  • var_correlation: list of variables for correlation studies
  • var_signal: signal
  • var_inv_mass: invariant mass
  • var_cuts: list of cut variables to perform standard rectangular cut analysis. Cut type and value are set in the arrays for each variable.
    plot_options:
      prob_cut_scan:
        cos_t_star:
          xlim: 
            - -1
            - 1
        pt_K0s:
          xlim: 
            - 0
            - 1
        pt_prong0:
          xlim: 
            - 0
            - 1
        pt_prong1:
          xlim: 
            - 0
            - 1
        pt_prong2:
          xlim: 
            - 0
            - 1
        nsigTOF_Pr_0:
          xlim: 
            - -4
            - 4
        armenteros_K0s:
          xlim: 
            - 0
            - 2
        signd0:
          xlim: 
            - 0
            - 0.3
        nsigTPC_Pr_0:
          xlim: 
            - -4
            - 4
      eff_cut_scan:
        cos_t_star:
          xlim: 
            - -1
            - 1
        pt_K0s:
          xlim: 
            - 0
            - 1
        pt_prong0:
          xlim: 
            - 0
            - 1
        pt_prong1:
          xlim: 
            - 0
            - 1
        pt_prong2:
          xlim: 
            - 0
            - 1
        inv_mass_K0s:
          xlim: 
            - 0.
            - 0.04
        armenteros_K0s:
          xlim: 
            - 0
            - 2
        signd0:
          xlim: 
            - 0
            - 0.3
        nsigTOF_Pr_0:
          xlim: 
            - 0
            - 1000
        nsigTPC_Pr_0:
          xlim: 
            - 0
            - 4
        imp_par_prong0:
          xlim: 
            - 0
            - 0.3
        imp_par_K0s:
          xlim: 
            - 0
            - 1.
        dca_K0s:
          xlim: 
            - 0
            - 1.

Few plotting options (axes ranges) to display properly the distributions of various relevant variables during the ML probability/efficiency cut scan.

  files_names:
    namefile_unmerged_tree: AnalysisResults.root
    namefile_reco: AnalysisResultsReco.pkl.lz4
    namefile_evt: AnalysisResultsEvt.pkl.lz4
    namefile_evtvalroot: AnalysisResultsROOTEvtVal.root
    namefile_evtorig: AnalysisResultsEvtOrig.pkl.lz4
    namefile_gen: AnalysisResultsGen.pkl.lz4
    namefile_reco_applieddata: AnalysisResultsRecoAppliedData.pkl.lz4
    namefile_reco_appliedmc: AnalysisResultsRecoAppliedMC.pkl.lz4
    namefile_reco: AnalysisResultsReco.pkl.lz4
    treeoriginreco: 'PWGHF_TreeCreator/tree_Lc2V0bachelor'
    treeorigingen: 'PWGHF_TreeCreator/tree_Lc2V0bachelor_gen'
    treeoriginevt: 'PWGHF_TreeCreator/tree_event_char'
    namefile_reco_ml_applied: AnalysisResultsRecoML.pkl.lz4
    treeoutput: "Lctree"
    histofilename: "masshisto.root"
    efffilename: "effhisto.root"
    crossfilename: "cross_section_tot.root"

Set file names and directories where to store files.

  multi:
    data:
      nperiods: 4
      nprocessesparallel: 20
      maxfiles : [-1,-1,-1,-1] #list of periods
      chunksizeunp : [100,100,100,100] #list of periods
      chunksizeskim: [100,100,100,100] #list of periods
      fracmerge : [0.05,0.05,0.05,0.05] #list of periods
      seedmerge: [12,12,12,12] #list of periods
      period: [LHC16pp,LHC16pp,LHC17pp,LHC18pp] #list of periods
                           
      unmerged_tree_dir: [/data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-0107/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/merged] #list of periods
      pkl: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-0107/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/pkl] #list of periods
      pkl_skimmed: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-0107/pklsk,
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/pklsk, 
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/pklsk,
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/pklsk] #list of periods
      pkl_skimmed_merge_for_ml: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-0107/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/pklskml] #list of periods
      pkl_skimmed_merge_for_ml_all: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_data_mltot
      pkl_evtcounter_all: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_data_evttot
    mc:
      nperiods: 4
      nprocessesparallel: 20
      maxfiles : [-1,-1,-1,-1] #list of periods
      chunksizeunp : [100,100,100,100] #list of periods
      chunksizeskim: [1000,1000,1000,1000] #list of periods
      fracmerge : [1.0,1.0,1.0,1.0] #list of periods
      seedmerge: [12,12,12,12] #list of periods
      period: [LHC16pp,LHC16pp,LHC17pp,LHC18pp] #list of periods
      unmerged_tree_dir: [/data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/merged,
                          /data/TTree/D0DsLckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/merged] #list of periods
      pkl: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/pkl,
            /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/pkl] #list of periods
      pkl_skimmed: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/pklsk,
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/pklsk,
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/pklsk,
                    /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/pklsk] #list of periods
      pkl_skimmed_merge_for_ml: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/pklskml,
                                 /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/pklskml] #list of periods
      pkl_skimmed_merge_for_ml_all: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pklskMLallperiods
      pkl_evtcounter_all: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pklevtppallperiod

Processing details.

  • nperiods: number of periods to be analysed (e.g. LHC16a, b, c, ...)
  • nprocessesparallel: max number of parallel processes
  • maxfiles: max number of files to be processed (-1: all files)
  • chunksizeunp: max number of files per process at unpacking step
  • chunksizeskim: max number of files per process at skimming step
  • fracmerge: fraction of the number of files to be merged per period
  • seedmerge: seed for random merginf from different periods
  • period: list of periods names
  • unmerged_tree_dir: directories where to store unmerged trees
  • pkl: directories where to store unmerged converted trees
  • pkl_skimmed: directories where to store skimmed dataframes
  • pkl_skimmed_merge_for_ml: directories where to store partial merging of skimmed dataframes from ML training/testing
  • pkl_skimmed_merge_for_ml_all: directories where to store merged dataframes after model application
  • pkl_evtcounter_all: directories where to store event dataframes
  ml:
    nbkg: 500000
    nsig: 500000
    sampletagforsignal: 1
    sampletagforbkg: 0
    sel_sigml: ismcprompt == 1
    sel_bkgml: inv_mass<2.186 or inv_mass>2.386  
    nkfolds: 5
    rnd_shuffle: 12
    rnd_splt: 12
    test_frac: 0.2
    binmin: [1,2,4,8,12] #list of nbins
    binmax: [2,4,8,12,24] #list of nbins
    mltype: BinaryClassification
    ncorescrossval: 10 
    mlplot: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/mlplot # to be removed
    mlout: /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/mlout # to be removed

    opt:
      filename_fonll: 'data/fonll/fo_pp_d0meson_5TeV_y0p5.csv' # file with FONLL predictions
      fonll_pred: 'max' # edge of the FONLL prediction
      FF: 0.1281 # fragmentation fraction
      sigma_MB: 57.8e-3  # Minimum Bias cross section (pp) 50.87e-3 [b], 1 for Pb-Pb
      Taa: 1 # 23260 [b^-1] in 0-10% Pb-Pb, 3917 [b^-1] in 30-50% Pb-Pb, 1 for pp
      BR: 1.09e-2 # branching ratio of the decay Lc->pKpi
      f_prompt: 0.9 # estimated fraction of prompt candidates
      bkg_data_fraction: 0.1 # fraction of real data used in the estimation
      num_steps: 111 # number of steps used in efficiency and signif. estimation
      save_fit: True # save bkg fits with the various cuts on ML output
      raahp: [1,1,1,1,1] #list of nbins
      presel_gen_eff: "abs(y_cand) < 0.5 and abs(z_vtx_gen) < 10"

Machine Learning optimization configuration block. It includes parameters for significance optimization process (to be set for the case under analysis).

  • nbkg: number of background candidates included in the training/validation/testing sample
  • nsig: number of signal candidates included in the training/validation/testing sample
  • sampletagforsignal: tag for signal sample
  • sampletagforbkg: tag for background sample
  • sel_sigml: signal candidates selections
  • sel_bkgml: background candidates selections (typically side-bands of the invariant mass distribution)
  • nkfolds: number of k folds
  • rnd_shuffle: rnadom shuffle number
  • rnd_splt: number of sub-samples for cross validation
  • test_frac: fraction of candidates kept for testing process
  • binmin: min pt values of the bins
  • binmax: max pt values of the bins
  • mltype: Machine Learning problem type
  • ncorescrossval: number of cores to be used for cross validation
  • mlplot: output directory - control plots
  • mlout: output directory - models

Significance optimization parameters

  • filename_fonll: file with FONLL predictions
  • fonll_pred: choose which FONLL curve use
  • FF: fragmentation fraction
  • sigma_MB: Minimum Bias cross section for pp, 1 for Pb-Pb
  • Taa: 23260 [b^-1] in 0-10% Pb-Pb, 3917 [b^-1] in 30-50% Pb-Pb, 1 for pp
  • BR: branching ratio of the decay under study
  • f_prompt: estimated fraction of prompt candidates
  • bkg_data_fraction: fraction of real data used in the estimation
  • num_steps: number of steps used in efficiency and signif. estimation
  • save_fit: decide wether to save bkg fits with the various cuts on ML output
  • raahp: array of RAA hypotheses
  • presel_gen_eff: preselections for efficiency estimate
  mlapplication:
    data:
      pkl_skimmed_dec: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-01077/pklskdec, 
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/pklskdec,
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/pklskdec,
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/pklskdec] #list of periods
      pkl_skimmed_decmerged: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-01077/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/pklskdecmerged] #list of periods
    mc:
      pkl_skimmed_dec: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/pklskdec,
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/pklskdec,
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/pklskdec,
                        /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/pklskdec] #list of periods
      pkl_skimmed_decmerged: [/data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/pklskdecmerged,
                              /data/Derived/LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/pklskdecmerged] #list of periods
    modelname: xgboost
    modelsperptbin: [xgboost_classifierLcpK0spp_dfselection_pt_cand_1.0_2.0.sav,
                     xgboost_classifierLcpK0spp_dfselection_pt_cand_2.0_4.0.sav,
                     xgboost_classifierLcpK0spp_dfselection_pt_cand_4.0_8.0.sav,
                     xgboost_classifierLcpK0spp_dfselection_pt_cand_8.0_12.0.sav,
                     xgboost_classifierLcpK0spp_dfselection_pt_cand_12.0_24.0.sav] 
    probcutpresel: 
      data: [0.3,0.3,0.3,0.3,0.3] #list of nbins
      mc: [0.3,0.3,0.3,0.3,0.3] #list of nbins
    probcutoptimal: [0.4,0.4,0.4,0.3,0.3] #list of nbins

Block where to configure ML model application.

  • pkl_skimmed_dec, pkl_skimmed_decmerged: set directories where to store data and Monte Carlo files after model application
  • modelname: name of the chosen model
  • modelsperptbin: model files
  • probcutpresel: loose probability cut to select candidates to be stored in te files
  • probcutoptimal: optimal probability cut used to get invariant mass distributions and efficiencies
  analysis:
    MBvspt:
      plotbin: [1,1,1,0]
      usesinglebineff: 0
      sel_binmin2:  [0,0,30,60] #list of var2 splittng nbins
      sel_binmax2: [9999,30,60,100] #list of var2 splitting nbins
      var_binning2: n_tracklets_corr
      sel_an_binmin: [1,2,3,4,5,6,8,12] #list of pt nbins
      sel_an_binmax: [2,3,4,5,6,8,12,24] #list of pt nbins
      binning_matching: [0,1,1,2,2,2,3,4] #list of pt nbins
      presel_gen_eff: "abs(y_cand) < 0.5 and abs(z_vtx_gen) < 10"
      evtsel: is_ev_rej==0
      triggersel: 
        data: "trigger_hasclass_INT7==1 and trigger_hasbit_INT7==1"
        mc: null
      data: 
        results: [LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/136_20190811-01077/resultsMBvspt, 
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_data/137_20190811-0108/resultsMBvspt,
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_data/138_20190811-0108/resultsMBvspt,
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_data/139_20190811-0108/resultsMBvspt] #list of periods
        resultsallp: LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_data/resultsMBvspt
      mc:
        results: [LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodD2H/129_20190811-0106/resultsMBvspt,
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2016_mc_prodLcpK0s/131_20190811-0106/resultsMBvspt,
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2017_mc_prodD2H/134_20190811-0107/resultsMBvspt,
                  LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_2018_mc_prodD2H/135_20190811-0107/resultsMBvspt] #list of periods
        resultsallp: LckINT7HighMultwithJets/vAN-20190810_ROOT6-1/pp_mc/prodD2H/resultsMBvspt
      mass_fit_lim: [2.14, 2.48] # region for the fit of the invariant mass distribution [GeV/c^2]
      bin_width: 0.001 # bin width of the invariant mass histogram
      usefit: true
      sgnfunc: [kGaus,kGaus,kGaus,kGaus,kGaus,kGaus,kGaus,kGaus]
      bkgfunc: [Pol2,Pol2,Pol2,Pol2,Pol2,Pol2,Pol2,Pol2]
      masspeak: 2.2864
      massmin: [2.14,2.14,2.14,2.14,2.14,2.14,2.14,2.14]
      massmax: [2.436,2.436,2.436,2.436,2.436,2.436,2.436,2.436]
      rebin: [6,6,6,6,6,6,6,6]
      includesecpeak: [0,0,0,0,0,0,0,0]
      masssecpeak: 2.2864
      FixedMean: true
      SetFixGaussianSigma: true
      SetInitialGaussianMean: true
      dolikelihood: true
      sigmaarray: [0.0078, 0.0078, 0.0082, 0.0091, 0.0097, 0.0109, 0.0117, 0.0156]
      FixedSigma: true  
      fitcase: Lc
      latexnamemeson: "L_{c}^{K0s}"
      latexbin2var: "n_{trkl}"
      nevents: 1700000000.
      dodoublecross: false

Block where to configure final analysis. There can be different types of analysis, e.g. MBjetvspt, SPDvspt, ...

  • plotbin: set which bins have to plotted [1,1,1,0]
  • usesinglebineff:
  • sel_binmin2, sel_binmax2: select min and max values of a an additional variable w.r.t. pt, e.g. multiplicity
  • var_binning2: additional variable to be considered for a double-differential analysis
  • sel_an_binmin, sel_an_binmax: list of min and max values for the first variable
  • binning_matching: decide wether to merge or not first variable bins
  • presel_gen_eff: preselections for efficiency estimate
  • evtsel: event selection
  • triggersel: trigger selection for data and MC
  • results, resultsallp: output directories
  • mass_fit_lim: region for the fit of the invariant mass distribution [GeV/c^2]
  • bin_width: bin width of the invariant mass histogram
  • usefit: decide wether to perform the fit or not
  • sgnfunc: signal fit function
  • bkgfunc: background fit function
  • masspeak: PDG mass
  • massmin,massmax: invariant mass range of the fit
  • rebin: histogram rebinning
  • includesecpeak: include fit of a second peak, e.g. D+ in case of Ds->KKpi
  • masssecpeak: PDG mass of the second peak
  • FixedMean: decide wether to fix mean or not
  • SetFixGaussianSigma: decide wether to fix sigma or not
  • SetInitialGaussianMean: set initial mean values
  • dolikelihood: decide wether to use likelihood fit option
  • sigmaarray: array of sigma values
  • FixedSigma: decide wether to fix sigma or not
  • fitcase: fit case
  • latexnamemeson: Latex format name of the particle under study
  • latexbin2var: Latex format name of the second variable
  • nevents: number of events
  • dodoublecross
  systematics:
    probvariation:
      prob_range: [0.5,0.6,0.7]

Block to configure the variation of the ML probability cut for the estimate of the systematic uncertainty.

  validation:
    data:
      dir: [dataval_16, dataval_16, dataval_17, dataval_18] #list of periods
      dirmerged: datavaltot
    mc:
      dir: [mcval_16, mcval_16, mcval_17, mcval_18] #list of periods
      dirmerged: mcvaltot

Directories where to store number of event information.