Title: A segmented principal component analysis applied to calorimetry information at ATLAS
1A segmented principal component analysisapplied
to calorimetry information at ATLAS
Federal University of Rio de Janeiro -
UFRJ Brazilian Center for Physics Research - CBPF
H. P. Lima Jr, J. M. de Seixas
- ACAT 2005 - May 22-27, Zeuthen, Germany
2Outline
- Scenario
- Signal processing
- Data assembling
- Data compaction (segmented principal component
analysis) - Particle discrimination (neural network)
- Conclusions
3Scenario
- The ATLAS trigger system comprises three
distinct levels of event selection LVL1, LVL2
and Event Filter. - From an initial bunch crossing rate of 40 MHz,
the trigger system will select events up to 100
Hz for permanent storage. - LVL1 operates at 40 MHz with reduced granularity
information in order to take a fast decision. It
also defines Regions of Interest (RoI) that will
guide the LVL2 selection process. - At LVL2 complex algorithms operate over full
granularity information, with a maximum latency
of 10 ms. - The three levels of selection use information
provided by the calorimeter system due to its
fast response and the detailed energy deposition
profiles it provides.
The ATLAS trigger system.
4Calorimeter system
- The ALTAS calorimeter is very segmented and
presents high granularity. - The proposed system should address 11
subdetectors layers of the electromagnetic and
hadronic calorimeters - Pre-sampler, barrel
- EM Calo, barrel, front layer
- EM Calo, barrel, middle layer
- EM Calo, barrel, back layer
- Pre-sampler, endcap
- EM Calo, endcap, front layer
- EM Calo, endcap, middle layer
- EM Calo, endcap, back layer
- Hadronic Calo, barrel, layer 0
- Hadronic Calo, barrel, layer 1
- Hadronic Calo, barrel, layer 3
Cross section of the EM Calorimeter.
5Signal processing
- The proposed signal processing approach will
operate at Level 2, on calorimeter data, in order
to - Reduce the high computational load due to the
high granularity of the information - Speed up the selection process
- Achieve higher particle identification
efficiency (main focus on electrons/jets
channel). - Proposed techniques
- Segmented principal component analysis ? in
order to explore the highly segmented calorimeter
system, data representation is made at the layer
level instead of global random process
representation. - Neural networks for particle identification ?
projected data will be concatenated and fed into
a feedforward neural network for electron/jet
discrimination.
6Data assembling
- Simulated LVL2 data produced in the Athena
environment were used. They correspond to jets
and two signatures of the Higgs boson in the
following decays H?2e-2µ and H?4e-. - Two types of data assembling were tested direct
and ring. - Direct assembling ? each data vector is
organized group cells in the way they appear in
the RoI layer. - Ring assembling ? for each calorimeter layer,
the cell with the highest deposited energy is
identified, and the data vector is formed by
sequentially grouping rings of cells around this
marked cell. - This type of assembling puts in evidence the
energy deposition pattern of the incident
particle, which is an important feature that
makes further classification easier to achieve.
1
25
2
2
Principal component extraction
1
24
25
5 x 5 RoI
data vector
(cell 1 has the highest deposited energy)
7Data compaction
- Due to the high complexity of the calorimeter
system, raw random vectors have up to 3115
components (calorimeter cells). - The following table illustrates the level of
compaction achieved for each subdetector layer,
for different levels of random process energy
preservation.
Subdetector Layer Original Dimension Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy Fraction of energy
Subdetector Layer Original Dimension 82 82 85 85 90 90 95 95 98 98
Subdetector Layer Original Dimension ring direct ring direct ring direct ring direct ring direct
Pre-sampler - barrel 105 1 18 2 20 3 24 5 31 16 42
EM Calo barrel front layer 800 5 166 6 187 11 233 35 309 109 390
EM Calo barrel middle layer 400 3 64 3 74 4 95 7 131 29 173
EM Calo barrel back layer 200 1 32 2 38 3 52 6 77 27 108
Pre-sampler - endcap 60 1 13 1 14 3 19 5 27 12 36
EM Calo endcap front layer 720 3 175 4 194 6 232 15 290 45 350
EM Calo endcap middle layer 400 2 36 3 42 4 53 6 75 16 106
EM Calo endcap back layer 200 2 15 3 18 5 26 11 42 37 70
Hadronic Calo barrel layer 0 100 8 35 12 40 23 50 41 63 60 76
Hadronic Calo barrel layer 1 90 15 37 21 41 32 48 45 59 66 72
Hadronic Calo barrel layer 2 40 15 19 16 21 19 25 25 30 31 36
TOTAL 3115 56 610 73 689 113 857 201 1134 448 1459
8Data compaction
- The following figures illustrate how much we
gain with ring data assembling.
It is point out ring data assembling allows
higher levels of compaction, as expected, since
data vectors are organized according to the
energy deposition pattern. For ring data
assembling 11 components preserve 90 of the
energy.
9Particle identification
- Particle identification is performed by a simple
three layer feedforward neural network. All
neurons have hyperbolic tangent as activation
function. - The input layer receives the calorimeter data
projections, concatenated as a single input
vector.
10Particle identification
- Network training was realized with the Resilient
Backpropagation (RPROP) algorithm. - This training algorithm eliminates the harmful
effects of the magnitudes of the partial
derivatives. Only the sign of the derivative is
used to determine the direction of the weight
update. - First runs of training were realized by
splitting randomly the complete data set
available (24068 electrons and 2066 jets) into
two data sets with the same size training and
testing. - A training step comprised a random selection of
a electron/jet pair in order to avoid
overtraining on electrons due to the different
statistics. - Preliminary results 90 efficiency.
11Conclusions
- The segmented PCA is a very attractive signal
processing approach to the calorimeter
information at ATLAS. The reasons are the high
segmentation of the subdetectors and their high
granularity. - Ring data assembling, following the energy
deposition pattern, achieved considerably higher
levels of compaction than the simple organized
group cells of each RoI. Results demonstrate that
a compaction level of more than 96 is achieved
if 90 of the energy is preserved. - Another possible approach under study is the use
of ring sums for data assembling, also making the
energy deposition pattern clear. - The relevance of the principal components will
be also investigated in order to verify the
importance of each component to the neural
classifier.