Title: Multivariate Discriminant Analysis applied to classification of ne CC events in MINOS Update
1Multivariate Discriminant Analysisapplied to
classification of ne CC events in MINOS-Update-
- Alex Sousa
- Tufts University
- MINOS Collaboration Meeting _at_ Fermilab
- 09/18/2004
2Changes since Ely
- Using new MDC sample fully reprocessed with R1.9.
- Consolidation of analysis framework
- Developed in C/ROOT, independent of Minossoft
downstream of ntuple generation. - Unsynced Reco and Truth trees Handling -
corrected some bugs and increased algorithm speed
and robustness. - Easy reading of analysis variables with cuts and
easy creation of oscillated samples - sample
generation problem with handling nt fixed - New Event Display
- CVS repository of the code available on the Tufts
MINOS server.
3Analysis Overview
MDC ntuples
Variables ntuple
Variable generation
Sample selection
Cuts
Computation of results
Event Display
MDA output
SAS Input format
Variable selection
MDA classification
4Samples
- Sample contents
- Constructed from 20 nm, 9 ne, and 39 nt MDC
ntuples processed with release R1.9 in the batch
farm and in the Tufts server. - Visible energy and track length cuts optimized
through use of decision trees (see Maylis talk) - Mild containment cut eliminates background from
nm truncated at the end of the detector - Training Sample
5Samples
6Variable Selection
- Variable selection is performed using SAS
Stepwise discriminant procedure - Original 77 variables sorted by discriminant
power - 45 variables selected for running on the training
sample - Best results for 18 variables
- uv_rms
- plane_n
- ph_pe
- nstrip
- uv_kurt
- trk_plane_ntrklike
- e_hit_total
- ntrack
- s_hit_trans_ratio
- shw_nstrip_ratio
- trk_pe_ratio shw_ph_nstrip max_pe_plane chisq_ndf
- uv_asym_peak e_hit_long e_hit_trans
trk_chi2_ndof
7Some Selected Variables
8MDA output (Probability Distributions)
Training Sample
9Threshold Determination
- Calculate the training sample Figure Of Merit for
several possible thresholds. Apply threshold
corresponding to highest FOM to test sample
classification.
10Results (Energy Distributions )
Test Sample (no threshold)
NC
ne
Signal
BG
BG
nm
nt
BG
11Results (Energy Distributions )
Test Sample (T0.88)
NC
ne
nm
nt
12Results (Energy Distributions )
Test Sample (T0.92)
ne
NC
nm
nt
13Results (Efficiencies )
Test Sample ( no osc, no threshold)
NC
ne
nt
nm
14Results (Efficiencies )
Test Sample (osc, no threshold)
NC
ne
nm
nt
15Results (Efficiencies )
Test Sample (T0.88)
NC
ne
nt
nm
16Results (Efficiencies )
Test Sample (T0.92)
NC
ne
nt
nm
17Results (ne appearance signal)
Test Sample (T0.88)
Test Sample (T0.92)
SignalBG
SignalBG
BG
BG
18Results (comparison table)
19Some events
20Some events
21Some events
22Current and Future Work
- Investigate discriminating power of cosq vs f
distributions. - Look at other less trivial threshold cuts in the
probability space. Evaluate method stability in
multiple randomly generated test samples. - Fit signal and background histograms to the test
sample instead of fixing a threshold. - Apply method to the Near Detector MDC files.
23Fitting (very preliminary)
- Instead of defining probability thresholds, fit
the complete MDA probability histograms defined
on the training sample to the ones obtained from
classification on the test sample.
24Fitting (very preliminary)
25Fitting (very preliminary)
26MDA Procedure
- Define a set of variables that
appropriately describes the data sample. - Calculate the covariance matrix for each class
- Determine the Mahalanobis distance to each class
for each event - Compute the probabilities for an event to belong
to each class (scores).
27Energy Distributions
Training Sample (no threshold)
28Energy Distributions
Training Sample (T0.88)
29Energy Distributions
Training Sample (T0.92)