Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai

Description:

Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai ML does not give goodness of fit ! ML will not complain if your assumed P(x;a) is rubbish The value of L ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: rogerb97
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai


1
Statistics for Experimental HEPKajari
MazumdarTIFR, Mumbai
2
Why do we do experiments?
? To study some phenomenon X which could be
  • Whether a particular species of particle exist
    or not.
  • If it does exist what are its properties, like,
    mass, lifetime, charge,
  • magnetic moment, .
  • If of finite lifetime, then which particles does
    it decay to? What are
  • the branching fractions?
  • Do the distribution of variables/parameters
    (energy, directions of
  • daughter particles) agree with theoretical
    predictions?
  • To probe what processes can occur and with what
    probability?
  • (in a given experimental situation depending on
    initial particles and
  • collision energy, of course)

3
Template of an experiment
  • Arrange for instances of X.
  • Record events that might be X.
  • Reconstruct the measurable quantities of visible
    particles.
  • Select events that could be X by applying cuts.
  • Histogram distributions of interesting
    variables? compare with
  • Theorectical prediction(s, there may be several
    in the market.)
  • Confrontation of Theory with Experiment.
  • Ask
  • Is there any evidence for X or is the null
    hypothesis refuted?
  • Given X, what are the parameters involved in the
    model?
  • Are the results from the experiment compatible
    with the predictions
  • of X?

4
Data analysis in particle physics
  • Observe events of a certain type, which has
    various
  • uncertianties which are quantified in terms of
  • probability
  • theory is not deterministic,
  • measuremnets has various random errors
  • other limitations, like cost, time, ..
  • Measure characteristics of each event (particle
    momenta, number of muons, energy of jets,...)
  • Theories (e.g. SM) predict distributions of
    these properties up to free parameters, e.g.,
    a, GF, MZ, as, mH, ...
  • Some tasks of data analysis
  • Estimate (measure) the parameters
  • Quantify the uncertainty of the parameter
    estimates
  • Test the extent to which the predictions of a
    theory are
  • in agreement with the data (? presence of New
    Physics?)

5
Everything is counting experiment in HEP within
certain approximations
  • To measure branching ratio or cross-sections, we
    count the
  • number of events produced/ observed (account for
    inefficiency of
  • observation).
  • To measure the mass of a particle, we use
    histogram where the
  • entry in each bin is a random (Poisson) process.
  • This unpredictability is inherent in nature,
    driven by quantum
  • mechanical consideration? everything becomes
    probabilistic.
  • When we want to infer something about the
    probabilistic processes
  • that produced the data, we need Statistics.

6
What happens if theres nothing ?
  • In the final analysis, we may make
    approximations, take a
  • pragmatic approach, or do things acc. to
    convention.
  • We need to have good understanding of
    foundational
  • aspects pf statistics.
  • Even if your analysis finds no events, this is
    still useful information about the way the
    universe is built.
  • Want to say more than We looked for X, we
    didnt see it.
  • Need statistics which cant prove anything.
  • We show that X probably has a mass greater
    than..
  • OR a coupling smaller than

7
Statistical tools mostly used in particle physics
1. Monte Carlo simulation 2. Likelihood methods
to estimate parameters. 3. Fitting of data. 4.
Goodness of fit. 5. Toy Monte Carlo to achieve a
given level of confidence given data (Neyman
construction).
8
Monte Carlo simulation
  • Theoretically, the distributions, perhaps with
    few unknown parameters,
  • are beautiful and simple (this is not only a
    dogma, but a reality)?
  • angular distributions may be flat or is a
    function of few trigonometric
  • variables, like, sin ?, cos?
  • masses may be described by Cauchy (Breit-Wigner)
    function,
  • time distribution may be exponential, or
    exponential, modulated by
  • sinusoidal oscillation (neutrino oscillation).
  • But they are modified due to various complicated
    effects e.g., higher
  • order perturbation effects may by one of the
    theoretical reasons.
  • The detector effects (reconstruction and
    measurement processes),
  • A monte carlo simulation starts with a simple
    distribution and is put
  • through repeated randomization to take into
    account of various
  • unavoidable complications, to finally produce
    histograms!
  • This is both
    computer and man-power intensive.

9
Mathematical definition
Concept of Probability
A random variable is a numerical characteristic
assigned to an element of the sample space ?
can be discrete or continuous.
Probability of discrete variable x P(x) Lim
N(x) / N
(N? 8) e.g, coins,
dice, cards, .. For continuous x ? Probability
Density Function (pdf) P(x to xdx) P(x)
dx e.g, parton (quark, gluon) density functions
of proton
  • P(A) is a number obeying the Kolmogorov axioms

Problem with mathematical definition No
information is conveyed by P(A) !
All As are considered equally likely. P(A)
depends on A and the ensemble.
In particle physics, A1, A2 outcomes of a
repeatable experiment, say, decays.
In frequentists interpretation P(Higgs boson
exists) either 0, OR, 1 but we
dont know which one is correct!
10
Treatment of probability in a subjective way.
  • In particle physics frequency interpretation
    often most useful,
  • but subjective probability can provide more
    natural treatment of non-repeatable phenomena
    eg., systematic uncertainties, probability that
    Higgs boson exists,...
  • ? P(A) is degree of belief that A is true.
  • Conditional probability ? probability of A, given
    B
  • and similarly, probability of B given A
  • But
    so so
  • If A, B are independent?

Baysian interpretation
In Bayesian probability, assume in advance a
probability that Higgs boson exists and then
interprete the data, taking into account all
possibilities which can produce such a data.
11
Frequentist Use of Bayes Theorem
  • Example Particle Identification Particle
    types e,?,?,K,p
  • Detector Signals DCH,RICH,TOF,TRD (different
    subdetectors through
  • Which particles pass and leave similar
    signatures.
  • Probability of a signal in DCH to be due to e is
    determined by probability of an e
  • to leave a detectable signal in DCH, probability
    of an electron to be produced in
  • a reaction and also total probability of DCh to
    register signals due to different
    particles

12
Continuous variable
Prob. to find x within xdx Eg., Parton density
function f(x) is NOT a probability!
x must be found somewhere!
Cumulative density prob. to have outcome less
than or equal to x
13
Expectation values
Consider continuous r.v. x with pdf f (x).
Define expectation (mean) value as Notation
(often) centre of
gravity of pdf. For a function y(x) with pdf
g(y),
s width of pdf, same units as x, given by.
(equivalent)
Define covariance covx,y as
Variance
Correlation coeff.
For x, y independent
Standard deviation
?
Reverse is not true!
14
Correlation
15
Statistics
  • Population includes all objects of interest ?
    large.
  • Parameters (mean, standard deviation etc.)
    are associated with a population (m, s).
  • Sample only a portion of population?
    convenient, but comes
  • with a cost. Statistic is associated with
    sample providing the characteristics or measure
    obtained from a sample.
  • We compute statistics to estimate parameters.
  • Variables can be discrete, like, number of
    events, or continuous,
  • like, mass of the Higgs boson.
  • Mean ? sum of all values/ no. of values.
  • Median ? mid-point of data after being ranked in
    ascending order ? there are as many numbers above
    the median, as below.
  • Mode ? the most frequent number in the
    distribution.

16
Basic Data description
  • Weighted mean
  • e.g., measurement of tracks using multiple hits.
  • Sample variance (unbiased estimator of
    population variance s 2)
  • Takes care of the fact that the sample mean is
    determined
  • from same set of observations x .

17
Distribution/pdf Example use in
HEP Binomial Branching ratio Multinomial Histogr
am with fixed N Poisson Number of events
found Uniform Monte Carlo method Exponential Dec
ay time Gaussian Measurement error Chi-square Go
odness-of-fit Cauchy Mass of resonance Landau
Ionization energy loss
18
The Binomial
  • Random process with exactly 2 possible outcomes,
    order not important.
  • ? Bernoulli process.
  • Individual success probability p, total n trials
    and r successes.

r0,1,2,. ..n 1-p ??p ? q 0 p 1
Mean np, variance np(1-p)
Eg., 1. Efficiency/Acceptance calculations.
2. Observe n number of W decays, out of which r
are of type W?mn, p branching
ratio. Multinomial distribution with m possible
outcomes. For ith outcome, pi success rate,
all are failure, ni random variable here,
binomially distributed. Eg. in a histogram with m
bins, total no. of N entries, content of each
bin is an independent random binomial
variable .
19
Poisson
  • Events in a continuum? number of events in
    data.
  • Mean rate n in time (or space) interval.
  • Prob. of finding exactly n events in a given
    interval

n 0,1,2, mean n, variance n
e.g., 1.cosmic muons reaching the lab.,
2.Geiger Counter clicks, 3. Number of
scattering events n with cross-section s, for a
given luminosity ?L dt, with
Exponential
x continuous, mean x, variance x 2
Proper decay time t of an unstable particle, life
time t,
20
Two Poissons
  • 2 Poisson sources, means ?1 and ?2
  • Combine samples
  • e.g. leptonic and hadronic decays of W.
  • Forward and backward muon pairs.
  • Tracks that trigger and tracks that dont .
  • What you get is a Convolution
  • P( r )? P(r ?1 ) P(r-r ?2 )
  • Turns out this is also a Poisson with mean ?1?2
    !
  • Avoids lot of worry!

Signal and background, each independent Poisson
variable, but total no. of observed events, is
also Poisson distributed! In actual experiment
total number of observed events expected
signal estimated background
21
Gaussian
  • f probability density for continuous, r.v. x,
    with mean m, variance s 2

- 8 lt x lt 8 - 8 lt m lt 8 s gt 0
  • Max. height 1/ s v (2p).
  • Height is reduced by factor of ve ( 61) at x
    m s ? half width at half max.
  • Probability of x to be within m 0.6745 s 45

Special case of standard / normalised Gaussian
If y Gaussian with m, s, then x (y- m)/s
follows f (x)
68.27 within 1? 95.45 within 2? 99.73 within
3?
90 within 1.645 ? 95 within 1.960 ? 99 within
2.576 ? 99.9 within 3.290?
These numbers apply to Gaussians and only
Gaussians!
22
Central Limit Theorem Why is the Gaussian
Normal?
  • If a continuous random variable x is distributed
    acc. to any pdf with finite mean and variance,
    the sample mean on n observations of x will have
    a pdf which approaches Gaussian for large n.
  • If xi is a set of independent variables of mean m
    and variance s2, y S xi/N, for large N,
    tends to become Gaussian with mean m and variance
    (1/N) s 2 .

Connection among Gaussian, Binomial and Poisson
distributions
p? 0, N?8, Npm
Binomial
Poisson
N?8
m ?8
Gaussian
23
  • For large variety of situations, if the
    experiment is repeated many times,
  • if the value of the quantity is measured
    accurately, without any bias,
  • the results are distributed acc. to Gaussian.
  • Typically, we assume that the form of
    experimental resolution is
  • Gaussian, which may not be the case quite often!
  • Artificial enhancement of significance of
    observed deviations.
  • It is also important to estimate the magnitude of
    the error correctly?
  • under-estimation of errors by 50 ? 4 s effect
    may be actually 2 s!

24
Multidimensional Gaussian
For a set of n Gaussian random variables, not
necessarily indepdt, The joint pdf is a
multivariate Gaussian
V is the covariance matrix of xs, symmetric,
nxn, V_iiVar(x_i), V_ij lt(x_i - m_i)(x_j
m_j)gt ?_ij .s_i . s_j ?_ij correlation coeff.
for x_i and x_j, ?_ij 2 1. ?_ij 0 for
x_i, x_j to be independent of each other.
Correlation coeff. ? cov(x,y)/ sx s y,
25
No correlation r 0
Each elliptical contour ? fixed probability
With correlations
26
(No Transcript)
27
  • More on correlation between variables.
  • Covariance matrix plays very important role in
    propagation of error in changing variables, from
    x to y (in first order only!).
  • -ve covariance ? anti-correlation.
  • Semi-axis of ellipse given by sq. root of eigen
    values of error matrix.
  • The ellipse provides the likely range of x, y
    values and they lie in
  • a region smaller than the rectangle defined by
    max of x,y values.
  • For the case of 2-variables, the point X lies
    outside 1-s.d. ellipsoid
  • with probability 61.

28
Chi-squared
Meann, Variance 2n
z is continuous random variable
  • z sum of squared discrepancies, scaled by
    expected error,
  • n 1,2, .. no. of degrees of freedom x_i
    independent Gaussians.
  • Used frequently to test goodness-of-fit.

Confidence level is obtained by integrating the
tail of the f distribution (from ? 2 upto
8) CL(? 2 )? f(z,n) dz Cumulative
distribution of ? 2 is useful in judging
consistency of data with a model. Since mean n
?a reasonable experiment should get ? 2
n Thus reduced ? 2 is a useful measure!
29
(No Transcript)
30
(No Transcript)
31
About Estimation
Probability Calculus
Theory
Data
Given these distribution parameters, what can we
say about the data?
Given this data, what can we say about the
properties or parameters or correctness of the
distribution functions?
Statistical Inference
Theory
Data
Having estimated a parameter of the theory, we
need to provide the error in the estimation as
well.
32
What is an estimator?
  • An estimator is a procedure giving a value for a
    parameter or property of the distribution as a
    function of the actual data values

A perfect estimator is consistent, unbiased and
efficient. Often we have to deal with less than
perfect estimator!
Minimum Variance Bound
33
The Likelihood Function
  • Set of data x1, x2, x3, xN
  • Each x may be multidimensional
  • Probability depends on some parameter a. a may be
    multidimensional!
  • Total probability (density)? The Likelihood
  • P(x1a) P(x2a) P(x3a) P(xNa)L(x1, x2, x3,
    xN a)

Given data x1, x2, x3, xN estimate a by
maximising the L.
In practice usually maximise ln L as its easier
to calculate and handle just add the ln P(xi) ML
has lots of nice properties (eg., it is
consistent and efficient for large N).
34
ML does not give goodness of fit !
  • ML will not complain if your assumed P(xa) is
    rubbish
  • The value of L tells you nothing.
  • Normalisation of L is important.
  • Quote only the upper limit from analysis.

Fit P(x)a1xa0 will give a10 constant P ? L
a0N Just like you get from fitting
eg., Lifetime distribution pdf p(t?)
? e -?t So L(?t) ? e ?t (single
observed t) , here both t and ? are
continuous pdf maximises at t 0 while L
maximises at ? t ????. Functional?form of P(t)
and L(?) are different

35
Lifetime distribution pdf p(t?) ? e
-?t So L(?t) ? e ?t (single
observed t) Here both t and ? are continuous pdf
maximises at t 0 L maximises at ? t ????.
Functional?form of P(t) and L(?) are different


Fixed l
Fixed t
L
P
?
t
36
Least Squares
  • Measurements of y at various x with errors ? and
    prediction f(xa)
  • Probability
  • Ln L
  • To maximise ln L, minimise ?2

y
? 2
  • Should get ?2 ?1 per data point.

Ndegrees Of FreedomNdata pts N
parameters Provides Goodness of agreement
figure which allows for credibility
So ML proves Least Squares.
37
Chi Squared Results
Extended Maximum Likelihood Allow the
normalisation of P(xa) to float Predicts numbers
of events as well as their distributions Need to
modify L Extra term stops normalistion shooting
up to infinity
  • Small ?2 comes from
  • Overestimated errors
  • Good luck
  • Large ?2 comes from
  • Bad Measurements
  • Bad Theory
  • Underestimated errors
  • Bad luck

Slide 36
38
Variance of estimator
  • One way to do this would be to simulate the
    entire experiment many times with a Monte Carlo
    program (use ML estimate for MC).
  • Log-likehood method expand around the maximum.


  • 2nd term is zero.
  • To a good approximation ?
  • Since
  • Basically, increase ?, until ln L decreases by
    -1/2.
  • For least square estimator

39
Hypothesis Testing
  • Consider a set of measurements pertaining to a
    particular subset of events
  • xi may refer to number of muons in the events,
    the transverse energy of the leading jet, missing
    transverse energy and so on.
  • refers to n-dim. joint pdf which depends on
    the type of event actually produced, eg.,

For each reaction we consider we will have a
hypothesis for the pdf of , e.g., f(
H0), f( H 1), and so on, where, Hi refers
to different possibilities. Say, H0 corresponds
to Higgs boson, H1, 2, .. ? backgrounds.
Now, each event is a point in space, so we
put a set of criteria/cuts, called Test
statistics and
work out the pdfs such that the sample space is
divided into 2 regions, where we accept or
reject H0 .
40
Level of Significance and Efficiency
Significance level
Probability to reject H0, if it is true
Error of 1st kind
To accept H0 when H1 is true
Power of test 1 - b
Error of 2nd kind
Probability to accept a signal event (signal
efficiency)
Probability to accept a background event
(background efficiency)
Purity of selected sample depends on the prior
probabilities as well as the efficiencies.
Write a Comment
User Comments (0)
About PowerShow.com