Title: Multivariate%20Discriminant%20Analysis%20applied%20to%20Classification%20of%20NC%20Events%20in%20MINOS%20-Near%20Data/MC%20Comparisons-
1Multivariate Discriminant Analysisapplied to
Classification of NC Events in MINOS-Near
Data/MC Comparisons-
- Alex Sousa
- Oxford University
- MINOS Collaboration Meeting
- NC Working Group
-
- 09/09/2006
2 Introduction
- Update on work presented at last Friday NC phone
meeting - Look at Near Detector Data/MC selection
comparison - Data distributions of input variables
- Use Near trained discriminant function to
classify data - Selection performance
- Note no reweighting applied.
3Multivariate Discriminant Analysis
- Define a set of discriminator variables
that appropriately describes the data sample - Calculate the covariance matrix for each class
- Determine the Mahalanobis distance to each class
for each event - Compute the score for an event to belong to each
class
4Analysis Structure Overview
Sntp ntuples
Analysis Ntuples
SAS Input format
NCAnalysis Reader Module
Efficiency-Purity, Fitting, etc.
Cuts
NCExtraction Object
MDA output
Variable selection
MDA classification
5Samples and Cuts
- Near
- Training sample 209 R1_18_2 (Carrot) Near MC
files (1.027x1016 POT/file) - Test sample 209 R1_18_2 (Carrot) Near MC
files (1.027x1016 POT/file) - No overlapping events
- Data test sample 24 December LE-10 sub-run files
(Run 9554) (8x1017POT) - Sample cuts
- Vertex contained in Oxford fiducial volume
- Far Longitudinal 0.272m lt vtxZ lt 13.66m 16.8m
lt vtxZ lt 28.9m - Far Transverse vtx XY position at least 0.50m
from closest edge - Coil hole vtx XY position gt
0.45m radius around center - Near Longitudinal 1.212m lt vtxZ lt 4.766m
- Near Transverse vtx XY position at least 0.50m
from closest edge (of a partial plane) - Event length lt 50 planes
- For data Use IsGoodBeamSnarl()
6Variable Selection
- Selection performed using Stepwise discriminant
analysis via SAS - Input 75 variables directly available from
AnalysisNtuples processing - Variables were sorted by discriminating power
- The 14 best variables were chosen to form the
multivariate discriminant
7Selected Variables
8Selected Variables
9Selected Variables
10MDA Probability Distributions
- Using the multivariate discriminant, each event
is assigned 2 scores or probabilities of
belonging to the NC or CC group.
11Prob. Threshold and Eff., Pur.
- Using a probability threshold cut we can tune
efficiency and purity values, e.g. Prob(NC) gt0.85
gt Eff. lt92, Pur. gt52
Training Sample
Test Sample
12Summary Table (Near Test)
- Results obtained when applying the classifier
obtained from the Far MC training sample to the
Near MC sample (Note test sample418 files).
Near Test Sample NC CC Beam ne NC CC Beam ne
Total 176047 1180485 15241 100 100 100
FidVolOx Cut 24952 88473 1538 14 8 10
Ev. Length lt 50 plane cut 24830 46751 1521 14 4 10
MDA Class. (sig. and bg.) 23639 22131 1485 13 2 10
MDA Class. (Prob(NC)gt0.75) 22737 19704 1458 13 2 10
Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier Divide Near test sample with cuts into 50 Training, 50 NewTest, create new MDA classifier
Total NewTest (with cuts) 12452 23443 792 -- -- --
MDA Class. (Prob(NC)gt0.85) 11445 9861 747 -- -- --
MDA Class (low efficiency) 6227 3742 583 -- -- --
NC Selection Efficiency 92 NC Selection Purity
52
NC Selection Efficiency 92 NC Selection Purity
52
NC Selection Efficiency 50 NC Selection Purity
59
13Summary Table
Near Test Sample NC CC Beam ne
Test (with cuts) 12452 23443 792
MDA Class. (Prob(NC)gt0.85) 11445 9861 747
MDA Class (low efficiency) 6227 3742 583
Total Data (with cuts) 12841 12841 12841
MDA Class. (Prob(NC)gt0.85) Data 7549 4270 ----
NC Selection Efficiency 50 NC Selection Purity
59
NC Selection Efficiency 92 NC Selection Purity
52
14Eff., Pur. vs Visible Energy
- Near NC Selection efficiency and purity as a
function of visible energy.
Near Training Sample
Near Test Sample
- CC contamination in lowest energy bin more severe
than in the Far case - Efficiency mostly flat between 0-6 GeV, decreases
somewhat in high energy tail
15Near NC Selection Distributions
- Distributions for events from the Near Test/Data
sample selected as NC.
16Near NC Selection Distributions
- Distributions for events from the Near Test
sample selected as NC.
17Future Work
- Implement reweighting before variable selection
and look for any improvements - Limit training set to variables displaying best
agreement between data/mc - Perform classification in with/without track
samples. - Use MDA selection and NCUtils implementation to
go through the full analysis chain. -
- Data/MC agreement using the MDA method looks
reasonable for a preliminary attempt, should only
improve from here on