Algorithm Timing and Performance Issues with emphasis on HLT algorithm online timing - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Algorithm Timing and Performance Issues with emphasis on HLT algorithm online timing

Description:

Up to 100 kHz LVL1 rate, up to 3 kHz LVL2/EB rate ... Current algo is cell based with fixed cone 0.4. Speed up by x2 in release 13 (Jonathan Ferland) ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 31
Provided by: Xin108
Category:

less

Transcript and Presenter's Notes

Title: Algorithm Timing and Performance Issues with emphasis on HLT algorithm online timing


1
Algorithm Timing and Performance Issues(with
emphasis on HLT algorithm online timing)
  • Xin Wu (University of Geneva)
  • On behalf of TDAQ
  • TP Week, June 7, 2007

2
Introduction
  • Offline algorithm timing/memory performance is
    linked directly to the efficiency of doing
    physics analyses
  • Think of it as sort of luminosity
  • Faster algorithms earlier results or with
    larger data sample
  • HLT algorithms timing/memory performance are even
    more critical
  • Slow algorithms or crashed processes contribute
    to DAQ dead time
  • DAQ dead time loss of luminosity
  • The issue is serious for offline
  • Offline Performance Task Force is formed to
    tackle it globally
  • The issue is serious for HLT
  • High LVL1 rate and limited budget (and space!)
    for HLT farms
  • Up to 100 kHz LVL1 rate, up to 3 kHz LVL2/EB rate
  • HLT algorithm timing are being optimized
    individually offline
  • Great progresses have been achieved in the past
    year
  • Global optimization done online with actual TDAQ
    hardware and realistic mixture of input events
    in Technical Runs

See next slide from Wim Lavrijsen
3
Update from Performance Task Force (Wim Lavrijsen)
  • Mandate reduce computing resource consumption of
    ATLAS software using technical means
  • Provide tools and know-how for developers
  • Identify problem area's and come up with
    solutions
  • Goals (with HLT requirements in mind)
  • Improve overall "uptime" of athena jobs
  • Reduce memory leaks, improve startup time, reduce
    initial memory sizes
  • Reduce CPU, memory, and I/O usage
  • Monitor and put in hand of algorithm developers
    etc.
  • Focus on peak usage and recovery from peak
  • Current work
  • Provide machinery to continuously monitor
    performance (early interception of major changes)
  • http//atlas-computing.web.cern.ch/atlas-computing
    /links/distDirectory/nightlies/aid_perfmon
  • Provide standard jobs for reference and
    benchmarking
  • Identify structural problems in ATLAS software
  • Dictionary sizes/use (ROOT team is working on
    next-gen dicts)
  • python2.4 memory allocation (move to python2.5),
    xml configuration (DB, python), malloc overhead
    (use arena's instead)
  • dld.so overhead (improve configuration, use
    on-demand)
  • Take home point for everyone FIX YOUR LD/LINK
    OPTIONS AND JOFILES!
  • object size increase in 64bits (remove padding)

See D. Quarries talk on Monday
4
Timing Requirement on HLT Algorithms
  • The benchmark 10 ms per event requirement of
    LVL2
  • Hard limit
  • 100 kHz LVL1 ? 10 ?s LVL2 average processing time
    (latency)
  • 500 1U slots for the LVL2 farm
  • Optimization with many scenarios
  • 500 dual CPU 8 GHz 1000 LVL2 processes ? 10 ms
    per L2PU
  • Or 500 dual quadcore 2 GHz 4000 LVPU ? 40 ms per
    L2PU
  • gt1 L2PU per core to improve CPU efficiency
  • Multi-threaded mode to improve memory and CPU
    efficiency
  • Timing include data access
  • Requires non-trivial optimization (150 ROSs send
    requested data to thousands of L2PU though the
    LVL2 data network)
  • The benchmark 1s per event requirement of EF
  • Hard limit
  • 3 kHz LVL2 ? 333 ?s EF average processing time
    (latency)
  • 1800 1U slots for the EF farm
  • Optimization with many scenarios
  • 1800 dual CPU 8 GHz 3600 EF processes ? 1.2s per
    PT
  • Or 1800 dual quadcore 2 GHz 14400 PT ? 4.8s per
    PT

5
Brief Summary of the March TR (19/3-23/3)
  • Hardware
  • final ROIB LVL1 emulator
  • Pre-series machines (dual 1-core 3.2 GHz or 2.4
    GHz)
  • 12 ROS, 2 L2SV, 12 L2 nodes running 2 L2PU each
  • 29 EF nodes running 2 PT application each
  • Software
  • tdaq-01-07-00, AtlasHLT 2.0.5-HLT, Offline
    12.0.5-HLT-1
  • All basic HLT slices integrated
  • e10, g10, mu6, tau10, jet20, cosmic, Bphysics,
    met
  • combined e10g10mu6tau10jet20
  • Input events
  • 6k events (mixed physics processes, 60 jets
    and 40 W/Z)
  • LVL1 simulated with CSC-05
  • Main achievement
  • Validated DAQ and HLT infrastructure with
    tdaq-01-07-00
  • Successfully configured and ran slices
    (individual and combined)

6
Brief Summary of the May TR (21/5-25/5)
  • Final Hardware
  • ROIB ( LVL1 emulator), 120 ROSs, 29 SFI
  • 4 HLT racks (130 dual quad-core 1.8 GHz), 5
    final system
  • Basically same software setup as March TR
  • Same Input events as March TR
  • Main achievement
  • Validated TDAQ and HLT infrastructure with final
    hardware
  • Measurements with dummy algorithm LVL2 and EF
    with final hardware
  • Functionality test with combined algorithm only
  • Tested DBProxy and triggerDB configuration
  • Preparation for M3 week
  • Good shift participation as March TR

7
Algorithm Online Timing General Remarks
  • Caveat Online timing measurement is a complex
    issue
  • Depends on many variables network layout, number
    of L2PU/PT per node, CPU speed, trigger slices,
    input events, algorithms,
  • Most are not final. Continuous optimizations need
    to be done
  • March/May TR are only first attempts
  • Results will certainly be improved
  • Will show mainly basic e, ?, mu, tau, jet reco
    timing (per RoI)
  • LVL2 calorimeter based reconstruction
  • T2CaloEgamma, T2CaloTau, T2CaloJet
  • LVL2 muon reconstruction muFast
  • LVL2 tracking IdScan (e, tau, mu), SiTrack (e
    only)
  • EF calorimeter data preparation
  • TrigCaloCellMaker, TrigCaloTowerMaker,
    TrigCaloClusterMaker
  • EF tracking 10 tools
  • EF Egamma, Muon, Tau, Jet reconstruction
  • TrigEgammaRec, TrigMoore, TrigTauRec, TrigJetRec
  • Will not have time to go over all them in
    details, apology!

LVL2 dedicated algorithms
EF use offline tools
8
LVL2 Egamma Reco T2CaloEgamma
Photon run March mean 6.6 ms/RoI
Egamma run March mean 7.4 ms
Combined run March mean 6.9 ms
Combined run May mean 6.2 ms
12 ROS L2PU 3.2 GHz
120 ROS L2PU 1.8 GHz
9
LVL2 Tau Calo Reco T2CaloTau
tau run March mean 12.9 ms/RoI
combined run March mean 6.4 ms
  • Faster in combined slice
  • Shorter data transfer time because of em/tau
    overlap (ROBDataProvider caching)!
  • Further improvement possible with common data
    preparation

combined run May mean 6.2 ms
10
LVL2 Jet Reco T2CaloJet
Jet run March mean 28.1 ms/RoI
Combined run Mar mean 26.0 ms
  • Current algo is cell based with fixed cone 0.4
  • Speed up by x2 in release 13 (Jonathan Ferland)
  • Further improvement under investigation using ROD
    preprocessed TriggerTower info

Combined run May mean 25.0 ms
11
LVL2 Muon Reco muFast
muon run March mean 6.2 ms/RoI
combined run March mean 4.9 ms
Combined run May mean 6.4 ms
note scale change
12
LVL2 Tracking IDScan for egamma
Electron run March mean 17.1 ms/RoI
Combied run March mean 14.9 ms
Offline with Xeon 2.4 GHz (D. Emeliyanov)
Combined run May mean 16.8 ms
13
LVL2 Tracking IDScan for Tau
Tau run March mean 13.0 ms/RoI
Combined run March mean 6.6 ms
  • Faster in combined slice
  • Because of ROB Data and SpacePoints caching
  • Further improvement with EM/Tau common tracking?

Combined run May mean 8.1 ms
14
LVL2 Tracking IDScan for Muon
Combined run March mean 14.8 ms
Muon run march mean 19.4 ms/RoI
  • Used lager RoI than egamma in release 12
  • Will be the same in release 13

Combined run May mean 15.9 ms
15
LVL2 Tracking SiTrack for Egamma
Electron run March mean 8.8 ms
Combined run March mean 7.8 ms
Combined run May mean 8.3 ms
  • Helped by upstream IdScan ROB Data and
    SpacePoints caching
  • Offline shows IdScan and SiTrack comparable in
    timing

16
L2PU Timing for Electron Run March
Total time for accepted
Total time for rejected
mean 19.7 ms
mean 71.5 ms/event
Why the offset? IDC feature used L2 code found
by Werner Wiedenmann, fixed now by RD
Processing time for accepted
Data collection time for accepted
mean 53.0 ms
mean 25.0 ms
Two tracking algorithms !
17
L2PU Time for Combined Slice Run in May (1)
Total time for accepted
Processing time for accepted
mean gt94.3 ms/event
mean 82.4 ms
Data request for accepted
Data collection time for accepted
mean 24/event
mean 26.5 ms
1ms/Request
18
L2PU Time for Combined Slice Run in May (2)
Total time for rejected
Processing time for rejected
mean 31.5 ms
mean 25.7 ms
Data request for rejected
Data collection time for rejected
mean 5.3/event
mean 6.0 ms
19
Egamma EF Calo Reconstruction Timing
TrigCaloCellMaker
TrigCaloTowerMaker
combined run May mean 27.0 ms
combined run May mean 16.0/RoI
TrigCaloClusterMaker
combined run May mean 65.4 ms
20
Tau EF Calo Reconstruction Timing
TrigCaloCellMaker
TrigTauRec
combined run May mean 62.6 ms
combined run May mean 13 ms/RoI
21
Jet EF Calo Reconstruction Timing
TrigCaloCellMaker
TrigCaloTowerMaker
Combined run May mean 7.0 ms/RoI
combined run May mean 6.6 ms
TrigJetRec (cone 0.4)
TrigJetRec doNoise
Combined run May mean 44.0 ms
Combined run May mean 48.6 ms
90 of TrigJetRec Good to monitor timing within
an algorithm!
22
Electron EF Track Reconstruction Timing (1)
PixelClustering
SCTClustering
combined run May mean 5.7 ms
combined run May mean 5.2 ms/RoI
TRTDriftCircleMaker
SiTrigSpacePointFinder
combined run May mean 9.1
combined run May mean 5.0 ms
23
Electron EF Track Reconstruction Timing (2)
SiTrigTrackFinder
TrigAmbiguitySolver
combined run May mean 168.6 ms/RoI rms 141.9 ms
combined run May mean 22.3 ms rms 23.3 ms
TrigExtProcessor
TRTTrackExtAlg
combined run May mean 6.8 rms 4.2
combined run May mean 27.4 rms 23.7
24
Electron EF Track Reconstruction Timing (3)
  • Summary ms
  • PixelClustering 5.2
  • SCTClustering 5.7
  • TRTDriftCircleMaker 9.1
  • SiTrigSpacePointFinder 5.0
  • SiTrigTrackFinder 168.6
  • AmbiguitySolver 22.3
  • TRTTrackExtAlg 6.8
  • TrigExtProcessor 27.4
  • TrigVxPrimary 5.1
  • TrigParticlesCreator 5.0
  • Total EF electron tracking timing
  • 260 ms/RoI with large rms

TrigVxPrimary
combined run May mean 5.1 ms
TrigParticlesCreator
combined run May mean 5.0 ms
25
Tau EF Track Reconstruction Timing
  • Summary ms
  • PixelClustering 5.7
  • SCTClustering 11.9
  • TRTDriftCircleMaker 20.2
  • SiTrigSpacePointFinder 5.1
  • SiTrigTrackFinder 46.2
  • AmbiguitySolver 8.7
  • TRTTrackExtAlg 5.1
  • TrigExtProcessor 9.9
  • TrigVxPrimary 5.1
  • TrigParticlesCreator 5.1
  • Total Tau tracking timing
  • 123 ms/RoI with large rms
  • No time saving in combined run!
  • Further optimization possible

Ntracks / Tau RoI
combined run May mean 3.4 tracks (tighter LVL2
cuts)
Ntracks / Egamma RoI
combined run May mean 7.9 tracks (very loose LVL2
cuts)
26
EF Track Reconstruction Timing from Offline
  • Done by I.Grabowska-Bold
  • Dual P4 _at_ 2.8 GHz
  • Same events as used in the Technical Runs
  • KF for release 12, ModKF for release 13
  • Not directly comparable with online numbers

27
EF InDet Full Event Reconstruction
  • Done by I.Grabowska-Bold
  • Dual P4 _at_ 2.8 GHz
  • FullScan and MinBias slices in release 13
  • Very interesting to try it out online

28
Egamma and Muon EF Reconstruction Timing
TrigEgammaRec
Muon Reconstruction
combined run May mean 33.6 ms
combined run May mean 40.8 ms
Muon Identification
combined run May mean 13.0 ms
29
EF Total Processing Time
combined run May mean 1.57 s
  • Remember again the caveat of online timing
    measurement before
  • Only a snap shot of one particular setup, still
    far away of being representative of the final
    hardware setup, typical high luminosity trigger
    menu, and actual LHC events!

30
Conclusions
  • HLT algorithm timing has made great progress in
    the past year
  • Individual algorithms give reasonable numbers
  • Further optimization necessary (faster
    algorithmmore events, margin for real data,
    complex menu, noise, pile-up, )
  • Investigate strategy change (merge RoI of
    different type, )?
  • Online study of overall HLT algorithm timing
    performance is starting
  • Confirmed offline numbers of simple inclusive
    slices
  • Overall performance strongly couples with TDAQ
    hardware/software
  • Requires a big collaborative effort to
    understand, with well designed tools, and enough
    access time to the hardware!
  • It needs to be constantly monitored
  • Next step is online test of Rel 13 with more
    complete trigger menu
  • HLT algorithm performance is much more than just
    execution time. Efforts are ongoing in many
    areas, in collaboration with the Task Force
  • Memory usage/memory leak (goal 10 B/evt in LVL2,
    1 kB/evt in EF )
  • Algorithm configuration and initialization time
  • Conditions database access
  • Multi-threaded mode
Write a Comment
User Comments (0)
About PowerShow.com