TerraFerMA A Suite of Multivariate Analysis Tools - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

TerraFerMA A Suite of Multivariate Analysis Tools

Description:

TerraFerMA is, foremost, a convenient interface to various disparate ... Due to begin data taking, Canary Islands, Aug 2003. 39 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 52
Provided by: smjt6
Category:

less

Transcript and Presenter's Notes

Title: TerraFerMA A Suite of Multivariate Analysis Tools


1
TerraFerMAA Suite of Multivariate Analysis Tools
  • Sherry Towers
  • SUNY-SB
  • smjt_at_fnal.gov

TerraFerMA is now ROOT-dependent only
(ie it is CLHEP-free) www-d0.fnal.g
ov/smjt/multiv.html
2
  • TerraFerMAFermilab Multivariate
  • Analysis (aka FerMA)
  • TerraFerMA is, foremost, a convenient
    interface to various disparate multivariate
    analysis packages (ex MLPfit, Jetnet, PDE/GEM,
    Fisher discriminant, binned likelihood, etc)
  • User first fills signal and background (and
    data) Samples, which are then used as input to
    TerraFerMA methods. A Sample consists of
    variables filled for many different events.

3
  • Using a multivariate package chosen by user
    (ie NNs, PDEs, Fisher Discriminants, binned
    likelihood, etc), TerraFerMA methods yield
    relative probability that a data event is signal
    or background.
  • TerraFerMA also includes useful statistical
    tools (means, RMSs, and correlations between the
    variables in a Sample), and a method to detect
    outliers.

4
  • TerraFerMA makes it trivial to compare
    performance of different multivariate techniques
    (ie simple to switch between using a NN and a
    PDE (for instance) because in TerraFerMA both use
    the same interface)
  • TerraFerMA makes it easy to reduce the number
    of discriminators used in an analysis (optional
    TerraFerMA methods sort variables to determine
    which have best signal/background discrimination
    power)
  • TerraFerMA web page includes full
    documentation/descriptions

5
  • The TerraFerMA TFermaFactory class takes as
    input a signal and a background TFermaSample,
    then calculates discriminators based on these
    samples using multivariate method of users
    choice.
  • TFermaFactory includes a method called
    GetTestEfficiencies() that returns signal eff vs
    bkgnd eff for various operating points, along
    with the cut in the discriminator for each
    operating point.

6
Where to find TerraFerMA
  • TerraFerMA documentation
  • www-d0.fnal.gov/smjt/ferma.ps
  • TerraFerMA users guide
  • www-d0.fnal.gov/smjt/guide.ps
  • TerraFerMA package
  • /ferma.tar.gz
  • (includes example programs)

7
Future Plans (maybe)
  • Add capability to handle Ensembles (ie if
    an analysis has more than one source of
    background, it would be useful to treat the
    various background Samples together as an
    Ensemble).
  • Users really want the convenience of this.
  • But is by no means trivial.

8

9
Future Plans (maybe)
  • Add Support Vector Machine interface.
  • Need to find a half decent SVM package
    (preferably in C). Also needs to be general
    enough that it can be used out of the box
    without fine tuning.

10
Most powerful...
  • Analytic/binned likelihood
  • Neural Networks
  • Support Vector Machines
  • Kernel Estimation

11
Future Plans (maybe)
  • Add in genetic algorithm for sorting
    discriminators.

12
Future Plans (maybe)
  • Add in distribution comparison tests such as
    Kolomogorov-Smirnoff, Anderson-Darling, etc.

13
Making the most of your datatools, techniques,
and strategies(and potential pitfalls!)
  • Sherry Towers
  • State University of New York
  • at Stony Brook

14
  • Data analysis for the modern age

Cost and complexity of HEP data behooves us to
milk the data for all its worth!
15
  • But how can we get the most out of our data?
  • Use of more sophisticated data analysis
    techniques may help (multivariate methods)
  • Strive to achieve excellent understanding of the
    data, and our modelling of it
  • Make innovative use of already familiar tools and
    methods
  • Reduction of number of variables can help too!


16

Tools and Techniques
17
Simple techniques
  • Ignore all correlations between
    discriminators
  • Examples simple techniques based on square
    cuts, or likelihood techniques that obtain
    multi-D likelihood from product of 1-D
    likelihoods.
  • Advantage fast, easy understand. Easy to tell if
    modelling of data is sound.
  • Disadvantage useful discriminating info may be
    lost if correlations are ignored

18
More powerful...
  • More complicated techniques take into account
    simple (linear) correlations between
    discriminants
  • Fisher-discriminant
  • H-Matrix
  • Principal component analysis
  • Independent component analysis
  • and many, many more!
  • Advantage fast, more powerful
  • Disadvantage can be a bit harder to understand,
    systematics can be harder to assess. Harder to
    tell if modelling of data is sound.

19
  • Fisher discriminant (2D examples)

Finds axes in parameter space such that
projection of signal and background onto axes
have maximally separated means.
Can often work well
But sometimes fails
20
Analytic/binned likelihood
  • Advantages
  • Easy to understand
  • Can take into account all correlations
  • Disadvantages
  • Dont always have analytic PDF!
  • Determination of binning for large number of
    dimensions for binned likelihood a pain (but
    possible)

21
Neural Networks
  • Most commonly used in HEP
  • Jetnet (since early 1990s)
  • MLPfit (since late 1990s)
  • Stuttgart (very recent)
  • Advantages
  • Fast, (relatively) easy to use
  • Can take into account complex non-linear
    correlations
  • (get to disadvantages in a minute)

22

How does a NN work?

NN formed from inter-connected neurons.
  • Each neuron effectively capable of making a cut
    in parameter space

23

24

25

26
Possible NN pitfalls
  • An easy-to-use black box!
  • Architecture of NN is arbitrary (if you make a
    mistake, you may end up being susceptible to
    statistical fluctuations in training data)
  • Difficult to determine if modelling of data is
    sound
  • Very easy to use many, many dimensions
  • (www-d0.fnal.gov/smjt/durham/reduc.ps)

27
The curse of too many variables a simple example
  • Signal 5D Gaussian m (1,0,0,0,0)
  • s
    (1,1,1,1,1)
  • Bkgnd 5D Gaussian m (0,0,0,0,0)
  • s
    (1,1,1,1,1)
  • Only difference between signal and background
    is in first dimension. Other four dimensions are
    useless discriminators

28
The curse of too many variables a simple example

29
The curse of too many variables a simple example

Statistical fluctuations in the useless
dimensions tend to wash out discrimination in
useful dimension
30
Advantages of variable reduction a real-world
example
  • A Tevatron RunI analysis used a 7 variable
    NN to discriminate between signal and background.
  • Were all 7 needed?
  • Ran the signal and background n-tuples
    through the TerraFerMA interface to the sorting
    method

31
A real-world example

32
Support Vector Machines
  • The new kid on the block
  • Similar to NNs in many respects. But, first map
    parameter space onto a higher dimensional space,
    then setup up neuron architecture to optimally
    carve up new parameter space.

33
Kernel Estimators
  • So far, under-appreciated and under-used in
    HEP! (but widely used elsewhere)
  • Gaussian Expansion Method (GEM)
  • Probability Density Estimation (PDE)
  • (www-d0.fnal.gov/smjt/durham/pde.ps)
  • Advantages
  • Can take into account complex non-linear
    correlations
  • Relatively easy to understand (not a black box)
  • Completely different from Neural Networks (make
    an excellent alternative)
  • (get to disadvantages in a minute)

34
  • To estimate a PDF, PDEs use the concept that any
    n-dimensional continuous function
    can be modelled by sum of some n-D kernel
    function
  • Gaussian kernels are a good choice for particle
    physics
  • So, a PDF can be estimated by sum of
    multi-dimensional Gaussians centered about MC
    generated points

35
  • Primary disadvantage
  • Unlike NNs, PDE algorithms do not save weights.
    PDE methods are thus inherently slower to use
    than NNs

36
TerraFerMA a universal interface to many
multivariate methods
  • TerraFerMA, released in 2002, interfaces to
    MLPfit, Jetnet, kernel methods, Fisher
    Discriminant, etc, etc, etc, also includes
    variable sorting method. User can quickly and
    easily sort potential discriminators.
  • http//www-d0.fnal.gov/smjt/multiv.html

37

Case Studies
38
Case 1 MAGIC (Major Atmospheric Gamma Imaging
Cherenkov) telescope data

Due to begin data taking, Canary Islands, Aug
2003
39

Cherenkov light from gamma ray (or hadronic
shower background)
Telescope

10 discriminators in total based on shape, size,
brightness, and orientation of ellipse
40

41

Methods for multidimensional event
classification a case study
R. Bock, A. Chilingarian, M. Gaug, F. Hakl, T.
Hengstebeck, M. Jirina, J.Klaschka, E.Kotrc,
P.Savicky, S.Towers, A.Vaiciulis,
W.Wittek (submitted to NIM, Feb 2003)
Examined which multivariate methods appeared to
afford the best discrimination between signal and
background Neural Networks Kernel PDE Support
Vector Machine Fisher Discriminant simultaneous
1D binned likelihood fit (and others!)

42

Results
Conclusion (carefully
chosen) sophisticated multivariate methods will
likely help

43
Case 2 Standard Model Higgs searches at the
Tevatron

Direct searches (combined LEP data) MHgt114.4
GeV (95 CL) Fits to precise EW data MHlt193GeV
(95 CL)
44
Discovery Thresholds

(based on MC studies performed in 1999 at
SUSY/Higgs Workshop (SHW)) Feb 2003 Tevatron
Higgs sensitivity working group formed to
revisit SHW analysescan Tevatron Higgs
sensitivity be significantly improved?
45
Sherrys physics interests at Dzero
Co-convener, Dzero ZH working group Convener,
Dzero Higgs b-id working group (NB H(bb)
dominant for MHlt130GeV) Working on search
strategies in ZH(mumubb) mode Can 1999 SHW
b-tagging performance be easily improved
upon? Can 1999 SHW analysis methods be easily
improved upon?

46
heavy-quark jet tagging
b-hadrons are long lived. Several
b-tagging strategies make use of presence of
displaced tracks or secondary vertices in b-jet

47

Two b-tags in any H(bb) analysis implies that
swimming along the mis-tag/tag efficiency curve
to a point where more light-jet background is
allowed in, may dramatically improve significance
of analysis

Significance of ZH(llbb) search (per fb-1) vs
light quark mis-tag rate

Improvement is equivalent to almost doubling the
data set!

48

How about multivariate methods? SHW analyses
looked at use of NNs (based on kinematical and
topological info in event) as a Higgs search
strategy

NN yields improvement equivalent to factor of two
more data relative to traditional analysis
But, shape analysis of NN discriminants instead
of making just square cuts in discriminants
yields an additional factor of 30!
49

Summary
50
  • Use of multivariate techniques are now widely
    seen in HEP
  • We have a lot of different tools to choose from
    (from simple to complex)
  • Complex tools are occasionally necessary, but
    they have their pitfalls
  • Assessment of data modelling harder
  • Assessment of systematics harder
  • Architecture sometimes arbitrary
  • Time needed to get to publication
  • Always better to start with a simple tool, and
    work way up to more complex tools, after showing
    they are actually needed!

51
  • Careful examination of discriminators used in a
    multivariate analysis is always a good idea
  • Reduction of number of variables can simplify
    analysis considerably, and can even increase
    discrimination power
  • And, exploring simple changes, or using familiar
    techniques in more clever ways can sometimes
    dramatically improve analyses!
Write a Comment
User Comments (0)
About PowerShow.com