RooStatsCms: a tool for analyses modelling, combination and statistical studies D. Piparo, G. Schott, G. Quast Institut f - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

RooStatsCms: a tool for analyses modelling, combination and statistical studies D. Piparo, G. Schott, G. Quast Institut f

Description:

RooStatsCms: a tool for analyses modelling, combination and statistical studies D. Piparo, G. Schott, G. Quast Institut f r Experimentelle Kernphysik – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 41
Provided by: Danil163
Category:

less

Transcript and Presenter's Notes

Title: RooStatsCms: a tool for analyses modelling, combination and statistical studies D. Piparo, G. Schott, G. Quast Institut f


1
RooStatsCms a tool for analyses modelling,
combination and statistical studiesD. Piparo,
G. Schott, G. QuastInstitut für Experimentelle
KernphysikUniversität Karlsruhe
2
Outline
  • The need for a tool
  • RooStatsCms (RSC)
  • A RooFit interlude
  • The three parts
  • Modelling
  • The datacard
  • Inspect your model
  • Statistical studies and limits
  • Profile Likelihood
  • Hypothesis separation and modified frequentist
    approach
  • Exclusion
  • Plotting classes

19.11.08
3
The need for a tool
  • No prexisting structured statistic software
    framework in CMS G. Quast, G. Schott and DP
    developed RooStatsCms
  • NEEDS
  • Reliable implementation of multiple statistical
    methods
  • Combine analyses
  • Stronger limits on quantities like Higgs
    production cross section, mass ...
  • Do not replace existing analyses but complement
    their results
  • Easy user interface
  • Satisfactory documentation (no black boxes)
  • Examples and tutorials

19.11.08
3
4
RooStatsCms
  • Originally thought for the CMS Higgs Working
    Group and a CMS (EKP) exclusive product
  • Based on RooFit (Part of the ROOT distribution)
  • Three parts
  • Modelling and combination
  • Statistical methods
  • Advanced graphic routines
  • It comes with CINT dictionaries (macros,
    interactive root).
  • Available to CMS and EKP at www-ekp.physik.uni-ka
    rlsruhe.de/RooStatsCms
  • Visit our wiki for username and password
  • Statistical methods and graphic routines public
    www-ekp.physik.uni-karlsruhe.de/RooStatsKarlsruhe
  • Big effort for documentation
  • RSC website and Doxygen of every class, method
    and member
  • Wikipages with links to RSC presentations (15)
    and workshop
  • https//twiki.cern.ch/twiki/bin/view/CMS/HiggsWGRo
    oStatsCms
  • http//www-ekp.physik.uni-karlsruhe.de/twiki/bin/
    view/EkpCms/RooStatsCms
  • 3. An internal CMS note in preparation

19.11.08
4
5
RooStatsCms - structure 1/2
  • Class design-wise structure
  • Already 33 classes!
  • All of them inherit from TObject persistency and
    reflexion
  • Moreover
  • Programs to compile
  • Macros for the interpreter
  • Various utilities in the Rsc namespace (TH1F
    median,..)

19.11.08
5
6
RooStatsCms - structure 2/2
Directory-wise structure
Directory Description
doc Links to the documentation
bin Executables after make exe command (see progs dir)
interface Header files
lib Here the library after the make command libRooStatsCms.so
macros The macros for cint
progs C programs to be compiled and linked against the library
scripts Utilities script python card maker, doxy, environment
src The sources
test well the directory for the tests!
  • Structure À la CMSSW ready to compile in the
    CMS framework with a newer RooFit

19.11.08
6
7
RooFit interlude ouverture
  • Toolkit for data modeling
  • Model distribution of observable x in terms of
  • parameter of interest p
  • other parameters q to describe detector effects
    (resolution ,efficiency)
  • Probability density function (pdf) F (xp,q)
  • normalized over range of observable x w.r.t. the
    parameters p and q
  • RooFit provides the functionality for
  • building these probability density functions
  • scalable to complex models
  • maximum likelihood fitting (binned and unbinned)
  • visualization of the pdf
  • toy MC generator

19.11.08
7
8
RooFit interlude functionality
  • Package developed, originally for BaBar analysis
    (by W. Verkerke and D. Kirkby)
  • actively maintained by W. Verkerke in view of LHC
    analysis
  • Web site http//roofit.sourceforge.net
  • Much material shown taken from Wouters
    presentations
  • see 200 slides presented at French statistics
    school (http//sos.in2p3.fr)
  • Users Manual in the ROOT siteftp//root.cern.ch/r
    oot/doc/RooFit_Users_Manual_2.91-33.pdf

19.11.08
8
9
RooFit interlude design
  • Mathematical entities are represented as C
    objects

19.11.08
9
10
RooFit interlude an example
19.11.08
10
11
RSC A solid tool
  • RSC is in production phase
  • Around since the beginning of the year 2008
  • Workshop at CERN in June
  • Approved results http//cms-physics.web.cern.ch/c
    ms-physics/public/HIG-08-008-pas.pdf
  • Coming soon results HIG-008-06 HWW
  • CMS statistics committee blessed the tool
    (internal note in preparation)
  • Grégory in permanent contact with them
  • Interest of other working groups
  • Negotiations for integration in CMS Software
    framework (CMSSW)
  • Base of a common tool with Atlas
  • Work in progress firsts commits in ROOT are
    taking place
  • New manpower Mario Pelliccioni (former BaBar)
    from Universita di Torino
  • Made in EKP (Quast, Schott, Piparo)
  • Personal assistance at 8th floor!

19.11.08
11
12
RSC Is it hard to try?
  • Straightforward to get started on ekpcms3

wget -O RooStatsCms.tar.gz http//cmssw.cvs.cern.
ch/cgi-bin/cmssw.cgi/ CMSSW/HiggsAnalysis/RooSta
tsCms.tar.gz?viewtar\pathrevV00-04-00 tar -zxf
RooStatsCms.tar.gz cd RooStatsCms source
/home/piparo/set_root_RSC_environment.sh source
scripts/RSCenv.sh make make exe cd
macros/examples/ root profilelikelihood_htt.cxx ro
ot qqhtt_-2lnQ_distributions.cxx
See also www-ekp.physik.uni-karlsruhe.de/RooStat
sCms for detailed instructions
19.11.08
12
13
RSC in one slide
A priori, I frequently believe I am in between
...
Statisticians
... RooStatsCms tries to put you somehow in
between...
19.11.08
13
14
The Three Parts
  • Analyses modeling and combination
  • Statistical Methods and limits
  • Graphics routines

19.11.08
14
15
Analyses modeling and combination
  • Modeling based on the datacard concept
  • Build a complete combined analysis model from
    ASCII datacards
  • Background and signal components of each
    analysis
  • Shapes from parametrisation or histos
  • Constraints and their correlations
  • Basic syntax include, if ...
  • Two lines of C to produce the RooFit Pdf
  • Datacard advantages
  • Automatic bookkeeping of what is done
  • Factorise model from C code
  • Easy to share

ASCII Card 2 analyses
RscCombinedModel mymodel ("hzz4l") RooAbsPdf
sb_pdfmymodel.getPdf()
19.11.08
15
16
RSC Modelling 2/2
  • Yields can be expressed as products of different
    terms. For example
  • Branching Ratios
  • Efficiencies
  • Cross section
  • Luminosity
  • Each term systematics can be included
  • Relate terms from one analysis to the other with
    correlations

Yield BR e sH Lumi
19.11.08
16
17
An example datacard counting
The combined
model // Here
we specify the names of the models // built down
in the card that we want // to be
combined include HZZ_4mu.rsc include
HZZ_4e.rsc include HZZ_2mu2e.rsc hzz4l
model combined components hzz_4mu,
hzz_4e, hzz_2mu2e
H -gt ZZ -gt
4mu hzz_4mu
variables x x 0 L(0 -
1) hzz_4mu_sig hzz_4mu_sig_yield 62.78
L(0 - 200) hzz_4mu_sig_x model
yieldonly hzz_4mu_bkg yield_factors_numbe
r 2 yield_factor_1 scale scale 1
L (0 - 3) scale_constraint
Gaussian,1,0.041 yield_factor_2 bkg_4mu
bkg_4mu 19.93 C hzz_4mu_bkg_x model
yieldonly
The variable
Comment
Basic syntax
Signal component description - Yield - Model
The combined model
Background component description yield made of
different terms.
Constraints syntax lttypegt,par1,par2
See RscBaseModel and RscCombinedModel
documentation for a complete description
19.11.08
17
18
An example datacard shapes
hgg_cat0 variables mh mh 115 L(90 - 180) //
GeV/c2 hgg_cat0_sig
yield_factors_number 3 yield_factor_1
lumi lumi 1 C
yield_factor_2 n_events_hgg_115_cat0_sig
n_events_hgg_cat0_sig 3.9577
yield_factor_3 scale_sig scale_sig
1 L (0 - 5) hgg_cat0_sig_mh model
fourGaussians hgg_115_cat0_sig_mh_mean1
114.654 /- 0.107106 C hgg_115_cat0_sig_mh_mea
n2 115.146 /- 2.37687 C hgg_115_cat0_sig_mh
_mean3 114.12 /- 0.581539 C
hgg_115_cat0_sig_mh_mean4 109.979 /- 11.036 C
hgg_115_cat0_sig_mh_sigma1 0.6075 /-
0.0888951 C hgg_115_cat0_sig_mh_sigma2
0.601995 /- 129.141 C hgg_115_cat0_sig_mh_sig
ma3 2.1119 /- 0.526549 C
hgg_115_cat0_sig_mh_sigma4 8.16619 /- 7.75118
C hgg_115_cat0_sig_mh_frac1 0.999893 /-
0.500053 C hgg_115_cat0_sig_mh_frac2
0.762761 /- 0.0870296 C hgg_115_cat0_sig_mh_f
rac3 0.98815 /- 0.0207781 C
hgg_cat0_bkg number_components 2
yield_factors_number 3 yield_factor_1
lumi lumi 1 C
yield_factor_2 n_events_hgg_115_cat0_bkg
n_events_hgg_cat0_bkg 988.389
yield_factor_3 scale_bkg scale_bkg
1 L (0 - 5) hgg_cat0_bkg1
qqhtt_bkg1_yield 1 C hgg_cat0_bkg2
qqhtt_bkg2_yield 1.35 C hgg_cat0_bkg1_mh
model doubleGaussian hgg_cat0_bkg_mh_mean1
52.3484 /- 14.1593 C hgg_cat0_bkg_mh_mean2
158.962 /- 3.21153 C hgg_cat0_bkg_mh_sigma1
27.1791 /- 2.37455 C hgg_cat0_bkg_mh_sigma
2 74.9328 /- 70.6298 C hgg_cat0_bkg_mh_frac
0.924937 /- 0.0347411 C hgg_cat0_bkg2_mh
model histo hgg_cat0_bkg2_mh _fileName
htt_inputs.root hgg_cat0_bkg2_mh name
background
Multiple components
Histogram and parametric models mixed
Comment
// The combined model of HZZ and Hgg include
hzz_combined.rsc Include hgg_12_categories.rsc h
gg_hzz_combined model combined
components hzz, hgg_cat0, hgg_cat1,...,
hgg_cat11
  1. Combination of combined models
  2. Counting combined with shape analyses

19.11.08
18
19
A combination
  • Combination of CMS H?gg, H ?ZZ (3 modes) 30 fb-1
  • Perform a simutaneous analysis of Higgs
    channels
  • for each analysis each data sample is fitted
    simultaneously with it is own signal and
    background model
  • combination of number counting and distribution
    based analyses
  • Significance sqrt(2lnQ)
  • Various analyses
  • Comparison between PTDR and RSC

19.11.08
19
20
More on constraints
  • Same name, same pointer principle (100
    correlation)
  • Same name in the card ? Same object in the model
  • Common Luminosity, cross-sections
  • Partial correlation among Gaussian constraints
    constraints block

combined_120_constraints_block_1
correlation_variable1 hww_mm_120_bkg_yield
correlation_variable2 hww_ee_120_bkg_yield
correlation_variable3 hww_em_120_bkg_yield
correlation_value1 0.80 C
correlation_value2 0.72 C
correlation_value3 0.15 C combined_120_constra
ints_block_2 ............
Correlated Variables
Correlation Coefficients
As many blocks as needed!
19.11.08
20
21
Analyses model structure
19.11.08
21
22
Inspect your model
  • Two programs to use
  • Model Diagramcreates a simple graph of the
  • combined model
  • model_diagram.exe ltcardnamegt ltmodelnamegt
  • Model Html creates a website to browse
  • your combined model
  • model_html.exe ltcardnamegt ltmodelnamegt

22
19.11.08
23
The Three Parts
  • Analyses modeling and combination
  • Statistical Methods and limits
  • Graphics routines

23
19.11.08
24
Profile Likelihood - 1/2
24
19.11.08
25
Profile Likelihood 2/2
  • Intersection with horizontal lines gives upper
    limits / two sided intervals
  • W.J. Metzger Statistical Methods in Data
    Analysis, Katholieke Universiteit Nijmegen,
    2002.
  • Systematics taken into account with penalty terms
    in the Likelihoods (profiling)

Likelihood scan l maximised for each point
Horizontal cuts
Interpolated scan minimum
?0 at minimum 7.168.1-5.37
  • Minuit uses the technique to obtain the fitted
    parameters errors
  • Significance estimator Ssqrt(2ln(Lsb/Lb))
  • ? if ?0 is N signal, the scan value at 0 is
    directly related to S !

See PLCalcuator, PLResults, PLPlot documentation
25
19.11.08
26
Systematics - 1/2
26
19.11.08
27
Systematics - 2/2
27
19.11.08
28
A PL prototype study
  • A prototype study distribution of upper limits
    using PL and a coverage study
  • Many pseudo experiments performed for each mass
    hypothesis
  • Distribution of upper limits obtained
  • Coverage fraction of experiments in which the
    upper limit is indeed greater than the parameter
    nominal value
  • Easy to do store PLResults objects in a TTree
    and loop on it.
  • Overcoverage for low yields
  • Well known feature of the method (Cramér-Fréchet
    Bound)
  • Calibrate the Likelihood

28
19.11.08
29
Separation of Hypotheses
  • Analysis of search results can be formulated as
    separation of hypotheses
  • Identify observable which comprises the result
  • Specify a test statistic
  • Define rules for discovery and exclusion
  • Use the likelihoods ratio, QLsb/Lb, assuming
    signalbackground (sb) and the background-only
    b hypotheses, as test statistic.
  • Consider P-values (also called CLSB, 1-CLB)
    of -2lnQ distributions obtained from sb and b
    samples
  • See
  • progs/m2lnq_creator.cpp
  • qqhtt_-2lnQ_distributions.cxx in macros/examples/

Bayesian pseudo-integration of systematics For
every toy MC experiment, before the generation
of the toy dataset, parameters affected by
systematics are properly fluctuated once.
1-CLb
CLsb
Distributions built with toy MC
experiments (LimitCalculator-HybridCalculator
Class)
19.11.08
29
30
Modified frequentist method Significance
  • CLB background CL, measure of the
    compatibility of the experiment with the B-only
    hypothesis
  • 1 CLB probability for a B-only experiment to
    give a more SB-like likelihood ratio than the
    observed one
  • Correspondence between CLB and the resulting
    significance (Gaussian approximation)
  • of standard deviations of an (assumed)
    Gaussian distribution of the background.
  • Take CLB assuming the expected sb yield (i.e.
    median -2lnQ for sb distribution)
  • CLSB measure of the compatibility of the
    experiment with the SB hypothesis
  • if CL is small ( lt 5 ) the SB hypothesis can
    be excluded at more than 95 CL but it does not
    mean that the signal hypothesis is excluded at
    that level

Modified frequentist approach take CLS the
signal significance, to be CLS CLSB / CLB
(heavily used by LEP, HERA and TEVATRON
experiments)
30
19.11.08
31
The benchmark analysis H???
  • Used as benchmark for the tool
  • Results approved by the CMS collaboration
  • Vector boson fusion H??? _at_1 fb-1
  • Small signal on a significant background
  • No discovery expected with this lumi
  • Four mass hypotheses
  • 115,125,135,145 GeV

Mass N Sig (12 sys) N Bkg (30 sys)
115 1.6 45.2
125 1.4 45.2
135 1.1 45.2
145 0.6 45.2
31
19.11.08
32
H??? Significance
  • Significance calculated for the H??? analysis
    using CLb
  • In this case significance does not tell us much.
  • The question becomes
  • Which production cross section can I exclude
    with the data I have?

CMS Week
32
19.11.08
33
Modified Frequentist method Exclusion
  • Assume to observe the expected background (i.e.
    median of the background distribution) and no
    signal
  • Amplify the SM production cross section by a
    factor necessary to obtain CLs0.05
  • ? 95 exclusion

80 h on one CPU
ExclusionBandPlot Class
  • Bands
  • Assume to observe Nb n sqrt (Nb), where
    n2,1,-1,-2 for the -2,-1,1,2 sigma band border
    respectively
  • Systematics taken into account in distributions
    of -2lnQ (marginalisation)

CMS Week
33
19.11.08
34
How do I find the right ratio?
  • RSC provides help
  • RatioFinder
  • RatioFinderResults
  • RatioFinderPlot
  • Just compile and launch the job(s)!

CLs 0.05
CMS Week
34
19.11.08
35
Another representation of the information
  • Use the distributions of the test statistic.
  • At glance see how the hypotheses are separated.
  • For each mH projection of -2lnQ distribution in
    B only hypothesis.

CMS Week
35
19.11.08
36
Statistical Methods class structures
Constraints Mother NLLPenalty
  • Organisation of the classes of statistical
    methods

LimitCalculator
Statistical Methods Mother StatisticalMethod
Constraint ConstrBlock2 ConstrBlock3 ConstrBlockAr
ray
LimitCalculator
PLScan
FCCalculator
Aka HybridCalculator
Sum the results batch/GRID jobs submission
easier
Statistical Methods Results Mother
StatisticalResult
LimitResults
PLScanResults
FCResults
Aka HybridResults
Statistical Plot Mother StatisticalPlot
LimitPlot
PLScanPlot (add also FC curves)
LEPBandPlot

ExclusionBandPlot
Aka HybridPlot
19.11.08
37
The Three Parts
  • Analyses modeling and combination
  • Statistical Methods and limits
  • Graphics routines

37
19.11.08
38
Plots collection
38
19.11.08
39
Troubleshooting
  • Q I want to start now. Where do I find the
    examples?
  • A In the macros dir you find the macros for the
    interpreter while in the progs directory the
    programs to compile with the make exe command.
  • Q I think I do not know how to write a datacard.
    How can I do?
  • A In the macros directory you find some
    datacards to find the inspiration. Moreover check
    the scripts in the scripts directory. You have
    the create_card_skeleton.py to query for
    templated card components and TDR_HZZ_card_maker.p
    y, to create the CMS PTDR H?ZZ?4l cards.
  • Q I compiled RSC but ROOT does not see the
    dynamic library libRooStatsCms.so. What do I do?
  • A Add to your LD_LIBRARY_PATH environmental
    variable the /RooStatsCms/lib dir. In the script
    directory you have the RSCenv.sh script to set up
    your environment. Then in the interpreter use the
    command gSystem-gtLoad(libRooStatsCms.so).
  • Q Still.. I cannot get it work!
  • A Come down to the eight floor for support!

39
19.11.08
40
Conclusions
  • Intuitive model factory
  • Build the analysis model from an ASCII
    configuration file, the datacard
  • Datacard also describes nuisance parameters (and
    correlations)
  • Building of a combined model for a combined
    analysis
  • Implementation of nuisance parameters and
    correlations
  • Can be marginalised or profiled
  • Statistical methods
  • LimitCalculator (CLb,CLsb,CLs) Complete
  • PLScan (Profile Likelihood) Complete
  • FCCalculator (fully frequentist
    approach) Validation to complete
  • Bayesian approach and Markov chains Being
    investigated
  • Strong implementation, tested and used by CMS
    analyses
  • Batch friendly decomposition in sub-jobs
    results stored in ROOT files
  • Results can be merged and exploited by results
    classes
  • Plots in a presentation ready form easily
    obtainable

40
19.11.08
Write a Comment
User Comments (0)
About PowerShow.com