The Lepton jets Channel - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

The Lepton jets Channel

Description:

Measuring the Top Quark Mass with Neural Networks. Carlos ... ANN Template Fitter ... statistical uncertainty returned by the fitter for the gluon and non-gluon ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 39
Provided by: andyr1
Category:
Tags: channel | fitter | jets | lepton

less

Transcript and Presenter's Notes

Title: The Lepton jets Channel


1
Measuring the Top Quark Mass with Neural Networks
Carlos Sanchez Ohio State University April 3,
2003 Seminar
  • Introduction
  • The Leptonjets Channel
  • Mass Measurement
  • Templates
  • Standalone NN
  • Gluon Radiation
  • Run II

2
The Standard Model
  • Standard Model Particles
  • quarks (u, c, t, d, s, b)
  • leptons (e, ?, ?, ?e, ??, ?? )
  • gauge bosons (g, ?, W?, Z0)
  • Standard Model Interactions
  • strong gluon g exchange
  • weak W?, Z0 bosons exchange
  • electromagnetic photon ? exchange
  • Successes
  • predicts a wide range of phenomena
  • valid down to distances 10-18 m
  • Unresolved issues
  • EW symmetry breaking (Higgs?)
  • fermion masses and mixing
  • gravitational interaction

3
Collider Detector at Fermilab
  • Fermi National Accelerator Laboratory (Fermilab)
  • Tevatron protons p on antiprotons?p at 1.8 TeV
    (Run I 1992-1996)
  • Two collision detectors CDF and DØ
  • Run II started March 2001 (by 2005 accumulate 20
    times more data)
  • The CDF detector
  • silicon layers
  • b-jet identification
  • central drift chamber
  • electromagnetic calorimeters
  • hadronic calorimeters
  • muon chambers

4
Top Production
  • Top quarks are produced
  • In pairs top-antitop (via strong interactions
    , )
  • Individually (via electroweak interactions
    , )
  • Huge amount of background
  • Top-pairs in Run1
  • 5 trillion collisions
  • 50 million events written to tape (40 terabytes
    of data)
  • 35 top-antitop events (in a sample of 76 events)

quark-antiquark annhilation (dominant at Tevatron)
gluon-gluon fusion (dominant at LHC)
5
Top Decay
  • The top quark decays into a W boson and a b quark
    with a branching ratio of nearly 100.
  • The lifetime of the top is very small ( 5 x
    10-25 sec).
  • The decays of the top quark are classified
    according to the W boson decays
  • Hadronic both W bosons decay into a
    quark-antiquark pair.
  • Dilepton both W bosons decay into a
    lepton-neutrino pair.
  • Leptonjets one W decays into a quark-antiquark
    pair while the other one decays into a
    lepton-neutrino pair.

6
Event Selection
  • All events in the mass analysis must pass the
    following cuts
  • an isolated lepton with PT gt 20 GeV/c
  • missing ET gt 20 GeV
  • at least three jets with ET gt 15.0 GeV and ? lt
    2.0
  • an additional jet with ET gt 15.0 GeV and ? lt
    2.0 or ET gt 8.0 GeV and ? lt 2.4
  • events that fall within the Z mass window and
    dilepton events are removed
  • after the mass reconstruction is performed,
    events are required to pass a goodness-of-fit
    cut, ?2 lt 10.0
  • We divide the top mass sample into four non-
    overlapping subsamples (Take advantage of
    different S/B ratios)
  • SVX Double Events with two SVX tags
  • SVX Single Events with one and only one SVX tag
  • SLT Events with one or two SLT tags, but no SVX
    tags
  • No Tags 4 or more jets with ET gt 15.0 GeV and
    ? lt 2.0

7
Top Mass Sample
  • Run I events in the different mass subsamples
  • Background processes
  • Wmultijet, non-W events, mistags, single top
    events, diboson events, and Drell Yan.

8
Mass Reconstruction (1)
  • The top mass is calculated by reconstructing the
    4-momenta of the top decay particles.
  • The hypothesis of Standard Model ttbar production
    process is

    followed by the decays

The final assignment of the decay partons is
determined by the mass reconstruction algorithm
9
Mass Reconstruction (2)
  • There are many ways to combine the top decay
    products to form the mass of the top quark.
  • The number of combinations is reduced if we use
    b-tagging information.
  • We define a chi2 function based on a series of
    energy and kinematic constraints to calculate the
    top mass.
  • All possible combinations are used.
  • We chose the reconstructed mass (Mrec) that
    corresponds to the lowest chi2.
  • Get the correct combination 50 of the time in
    Double SVX subsample.
  • Incorrect combinations still have info.

10
Gaussian Gamma Templates
  • We have generated a set of ttbar Monte Carlo
    samples ranging from 120 to 230 GeV.
  • They tend to peak around their generated mass,
    and they have asymmetric tails.
  • We should be able to fit all the templates to a
    single function that only depends on the top mass
    Mtop.
  • Finite number of MC events.
  • Continuous form allows us to obtain the Mrec
    distribution for any given Mtop.
  • We fit the signal templates with a combination of
    a Gaussian and a Gamma function.
  • The background samples are fitted in a similar
    way but, the mass dependence is removed.

SVX Single distributions
11
Extracting Mtop
  • Compare the shape of the data Mrec distribution
    to Monte Carlo expectations
  • Continuous likelihood procedure is used to
    extract Mtop.
  • It uses the functional forms for signal and
    background.
  • In the fit, Mtop is the only free parameter and
    the background fractions are constrained to be
    within their expected values.
  • Median Mtop and error from 2000 pseudoexperiments
    is

Mtop 175.1 ? 7.3 GeV/c2
12
Improvements
  • GaussianGamma fit is motivated by the shape of
    the distributions.
  • Any function that can fit the templates can be
    used.
  • Neural Networks are able to approximate any
    function.
  • Do not need to make an a-priori decision of what
    the underlying function describing the
    distribution is.
  • Not limited to 1D distributions.
  • Including more information to measure Mtop.
  • There are other kinematic variables, which have
    mass information.
  • Neural Network provides a simple an elegant way
    of combining many variables into a single
    analysis.
  • Takes into account correlations between the
    different variables.
  • Classify events into top signal or background.

13
Introduction to ANN
  • ANN is a function of N variables
  • Useful graphical representation
  • All nodes above the input layer perform a simple
    calculation

hidden
Output
  • Architecture grid of nodes
  • Weights connections between nodes
  • Activation function g(x) non-linear

g(x) tanh(x)
w3
w1
Input1
Input2
Input3
14
ANN Learning Rule
  • A Neural Network has to be configured such that
    the application of a set of input values produces
    the desired output values.
  • Supervised learning uses learning samples to
    train the network to perform a given task.
  • Neural Network learning rule
  • Start with a random weights for the connections.
  • Select an input vector from the learning samples.
  • Modify all the weights so that the Neural Network
    output is as close as the desired output as
    possible.
  • Weights are modified by minimizing an error
    function.
  • Back-propagation algorithm.
  • Return to step 2.

15
Simple Example and problem
and problem
16
ANN Performance
  • Information
  • Use the best variables available to train the
    network.
  • Architecture
  • Input and output nodes are determined by the
    problem you want to solve.
  • and problem can be solved with 2 input nodes and
    1 output node.
  • Hidden nodes
  • Try different number and choose architecture that
    results in the best performance.
  • Learning sample
  • At least use 10 times the number of weights in
    the network.
  • Number of iterations
  • Make sure the network has been trained properly
  • X2 method to decide when to stop the training
    (function approximation).
  • Look at testing sample (pattern classification).

17
ANN Template Fitter
  • We use the MLPfit package to fit the Mrec
    templates as a function of top mass, Mtop.
  • To solve this problem we need
  • 2 input nodes
  • One associated with Mrec
  • One associated with Mtop
  • 1 output node
  • desired output value is set the number of events
    in each bin.
  • The architecture chosen to fit the signal
    distributions is 2-4-4-1.
  • To fit the background we choose an architecture
    of 1-5-1.
  • No Mtop dependence

The Mrec distributions range from 80 GeV/c2 to
380 GeV/c2 and are divided into 5 GeV/c2 bins.
18
Template Comparison
  • We perform a ?2 fit to make sure both fitting
    techniques are working correctly.
  • Individual ?2 fits to each mass template are also
    lower for the NN fits than for the GG fits.

SVX Single distributions
19
Pseudoexperiments
  • Uses same continuous likelihood procedure
    described previously.
  • MC results from 2000 simulated experiments are
    shown below
  • Both fitting methods use exactly the same
    templates.
  • NN fitting method gives a result which is 12
    better than the GG fitting technique.
  • Applicable to any distribution.

20
Mass Information
  • Other analyses have shown that other kinematic
    quantities have mass information.
  • A variables with good mass information will have
    a small RMS/slope.
  • Narrow distributions.
  • Large separation between means.

21
HT Mass Analysis
  • We can use a Neural Network to fit the HT
    distributions.
  • We ran 2000 pseudoexperiments and we find
  • How do we combine the Mrec and HT mass results?
  • Simple if the variables are uncorrelated.
  • Difficult to find a way to include correlations
    into the likelihood.
  • Combining the results a-posteriori is not
    straight forward.
  • Look at other options.

SVX Single distributions
Mtop 174.6 ? 11.3 GeV
22
Standalone Neural Network (1)
  • A Neural network provides a simple and elegant
    way to add more information.
  • New variables are added as input nodes.
  • Correlations between the variables are naturally
    taken into account.
  • We want to train a NN to classify events into the
    different top masses generated for this study as
    well as background.
  • We have generated 23 different top samples. Our
    NN has 24 outputs (23 signal 1 background).
  • We train the NN using the BFGS method.
  • In training and testing, all 23 different top
    mass samples (signal) as well as background are
    used.
  • During the training, we set the output target
    value for each class to 1.0.
  • In our 24-dimensional output space Mtop 175
    GeV/c2 corresponds to (0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). The
    background target value is given by (0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 1).
  • Output values can be interpreted as a-posteriori
    Bayes probabilities for each class.

23
Standalone Neural Network (2)
  • We use the MLPfit package.
  • We have generated a Neural Network for each of
    the mass subsamples.
  • Training is stopped when the testing curve begins
    to increase.
  • NN begins to learn the specific features of the
    training sample.
  • The sum of the outputs should be equal to 1.

NN performance
24
Extracting Mtop
  • The NN output probabilities are used to construct
    a discrete likelihood to extract Mtop.
  • Each point in the likelihood is associated with a
    generated top mass.
  • Each point contains an admixture of signal and
    background.
  • To account for asymmetric errors we fit the
    log-likelihood with a 3rd degree polynomial.
  • Mtop is given by the minimum.
  • Mtop errors are given by the 0.5 unit increase in
    the log-likelihood.
  • Can we construct a continuous likelihood that
    uses this information?

25
Pseudoexperiments
  • We ran 2000 pseudoexperiments with the Run I
    statistics.
  • The Standalone NN technique does 30 better than
    the GG-fitted template method.
  • The Standalone NN technique does 16 better than
    the NN-fitted template technique.

26
Systematic Errors
  • Jet energy scale apply 1s and -1s shifts to the
    jet momenta in ttbar signal and background. The
    error is half the median between the -1s and 1s
    distributions.
  • ISR/FSR turn off ISR in PYTHIA. For FSR we
    choose only the events with 4 jets which are
    uniquely matched to partons. We add the errors to
    obtain the systematic uncertainty.
  • b-tagging create two samples with only fake SLT
    tags or real SLT tags. The error is half the
    difference.
  • PDF change PDF in PYTHIA to CTEQ3L.
  • MC generators compare HERWIG samples to PYTHIA
    templates.

27
Gluon and Non-gluon Events
Non-gluon event all 4 highest ET jets matched to
MC partons.
Gluon event at least one of the highest ET jets
not matched to a MC parton.
lepton
neutrino
28
Gluon Effect on Top Mass
  • Use subsamples with the smallest amount of
    background.
  • From our Monte Carlo sample we see
  • 52.7 ? 0.66 of the events in the single
    SVX-tagged sample contain at least one gluon
    among their 4 highest ET jets.
  • 48.7 ? 1.1 of the events in the double
    SVX-tagged sample contain gluons.
  • We construct mass templates, which are divided
    into two sets
  • Gluon templates only contain events with at
    least one gluon jet among their 4 highest ET
    jets.
  • Non-Gluon templates only contain events in which
    the 4 highest ET jets have been uniquely matched
    to the top decay partons.
  • There are features of the mass analysis from Run
    I consistent with the data containing less gluon
    than the Monte Carlo predicts.
  • Would a better knowledge of the gluon content in
    the mass sample lead to a better top mass
    measurement?

29
Templates
  • Gluon events will be mismeasured.
  • Gluon templates peak at a lower mass than the
    non-gluon distributions.
  • Non-gluon templates show better separation that
    gluon templates do.
  • Non-gluon templates are narrower that the
    templates containing gluon events.
  • We look at RMS/slope for the different templates
  • Events which do not contain gluon jets will
    provide a better top mass measurement.

30
Pseudoexperiments
  • We have generated a new set of templates for
    which we vary the percentage of gluon events.
  • The new templates have 20, 40, 60 and 80
    gluon events in them.
  • We ran a series of pseudoexperiments in which we
    draw the events from the above templates and
    compare them to the default Monte Carlo.
  • The results from the Single SVX subsample are
    given below.

31
NN Input Variables
  • We want to develop a Neural Network that can
    distinguish gluon events from non-gluon ones.
  • This study uses the following three variables
  • Di-jet invariant mass in the case of single
    tagged events there are 3 ways you can combine
    the untagged jets to form the di-jet mass. We
    only use 2 of them since the one constructed from
    the two least energetic jets differs very little.
  • Number of extra jets this is the number of jets
    with ETgt 8.0 and ? lt 2.4 besides our
    four highest jets.
  • X2 this is a goodness-of-fit parameter returned
    by MINUIT after we reconstruct the top mass for a
    given event.
  • All these variables have good gluon content
    information.
  • A Neural Network is well suited for this analysis
    since it provides a natural way of combining all
    of the variables.

32
NN Input Variables
Single SVX subsample
Double SVX subsample
33
Neural Network
  • We are using the JETNET subroutines interfaced to
    ROOT via the Root_Jetnet package.
  • We have two separate Neural Networks
  • NN_1SVX (4-8-1) the Single SVX NN has four input
    variables.
  • NN_2SVX (3-6-1)
  • We trained the Neural Network using two different
    MC samples one containing only gluon events, and
    the other on without them.
  • During training, the desired NN output for
    non-gluon events was set to 1, while the output
    corresponding to gluon events was set to 0.

34
Run I and Run II
Pseudoexperiments
  • We perform 5000 pseudo-experiments drawing events
    from our Monte Carlo samples following the shape
    of the distributions.
  • After our pseudo-experiments we find
  • The average statistical uncertainty returned by
    the fitter for the gluon and non-gluon content is
    19.1 and 17.5 respectively.
  • The median statistical uncertainty for Run II
    will be 5.6 for the gluon content and 4.3 for
    the non-gluon content.

35
Data-like Templates
  • We apply our fitting technique to the actual Run
    I data and we find
  • The non-gluon content in the SVX Run I data is
    higher than what we expect from the Monte Carlo.
  • The error in the measurement are rather large
    (roughly 20).
  • Use the mean of the gluon content measurement to
    construct data-like templates.
  • Results from MC pseudoexperiments

36
Run II Mass Analysis
  • The biggest gain in the Standalone NN method
    comes from the combination of HT and Mrec.
  • We propose the use of a NN to fit a series of 2-D
    histograms of Mrec versus HT.
  • High number of statistics.
  • Adaptive binning.
  • These 2-D surfaces would be used much like the
    Mrec templates in the Standard mass analysis.
  • We would use the continuous log likelihood
    approach to calculate the top mass.

Single SVX distributions
Mrec Vs. HT surfaces generated with NN functional
form.
37
Run II Expectations
  • The different methods used to obtain the mass of
    the top quark are presented.
  • Using the 2D surfaces gives
  • a result which is 30 better than using the Mrec
    templates fitted with a GaussianGamma functions.
  • a result which is 11 better than using the Mrec
    templates fitted with a Neural Network.
  • The goal for Run II is to measure Mtop with an
    error less than 2.0 GeV/c2.
  • Systematic error will dominate the error in the
    top mass.

Statistical error only
38
Conclusions
  • NN template fitter
  • Provides a better measurement that the previous
    GaussianGamma fitting technique.
  • Applicable to any set of kinematic distributions.
  • Standalone NN
  • Combines different information to obtain a better
    top mass estimate.
  • 2D NN fit gives similar results.
  • The techniques presented here will help improve
    the top mass measurement in Run II.
  • Although these multivariate techniques were
    developed for the top mass analysis they can be
    applied to many different physical processes
  • single top, Higgs boson, Supersymmetry, etc.
Write a Comment
User Comments (0)
About PowerShow.com