CBM DAQ and Event Selection - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

CBM DAQ and Event Selection

Description:

Topical Workshop: Advanced Instrumentation for Future Accelerator Experiments, ... Current working hypothesis: CPU FPGA hybrid system (proviso follows) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 56
Provided by: walt133
Category:
Tags: cbm | daq | event | proviso | selection

less

Transcript and Presenter's Notes

Title: CBM DAQ and Event Selection


1
CBM DAQ and Event Selection
  • Walter F.J. Müller, GSI, Darmstadt
  • for the CBM Collaboration
  • Topical Workshop Advanced Instrumentation for
    Future Accelerator Experiments, Bergen, Norway,
    4-6 April 2005

2
Outline
  • CBM (very briefly)
  • observables
  • setup
  • FEE/DAQ/Trigger
  • requirements
  • challenges
  • strategies

3
CBM at FAIR
SIS 100 Tm SIS 300 Tm U 35 AGeV p 90 GeV
Compressed Baryonic MatterExperiment
4
CBM Physics Topics and Observables
  • In-medium modifications of hadrons
  • ? onset of chiral symmetry restoration at high
    ?B ? measure ?, ?, ? ? ee- (µ µ-)
    open charm D0, D
  • Strangeness in matter
  • ? enhanced strangeness production ? measure
    K, ?, ?, ?, ?
  • Indications for deconfinement at high ?B
  • ? anomalous charmonium suppression ? ?
    measure D0, D
  • J/? ? ee- (µ µ-)
  • Critical point
  • ? event-by-event fluctuations
  • ? measure p, K

Good e/p separation
Vertex detector
Low cross sections? High interaction rates?
Selective Triggers
Hadron identification
5
CBM Setup
? Radiation hard Silicon pixel/strip detectors in
a magnetic dipole field ? Electron detectors
RICH TRD ECAL pion suppression up to 105 ?
Hadron identification RPC, RICH ? Measurement
of photons, p0, ?, and muons ECAL
6
CBM and HADES
All you want to know about CBMTechnical Status
Report (400 p)now available under http//www.gsi.
de/documents/DOC-2005-Feb-447-1.pdf
7
Meson Production in central AuAu
W. Cassing, E. Bratkovskaya, A. Sibirtsev, Nucl.
Phys. A 691 (2001) 745

10 MHz interaction rateneeded for 10-15 A GeV
SIS300
8
A Typical AuAu Collision
Central AuAu collision at 25 AGeV URQMD
GEANT 160 p 170 n 360 ?-
330 ? 360 ?0 41 K 13 K-
42 K0
? 107 AuAu interactions/sec ? 109
tracks/sec to reconstruct for first level event
selection
9
CBM Trigger Requirements
assume archive rate few GB/sec 20 kevents/sec
  • In-medium modifications of hadrons
  • ? onset of chiral symmetry restoration at high
    ?B ? measure ?, ?, ? ? ee-
    open charm (D0, D)
  • Strangeness in matter
  • ? enhanced strangeness production ? measure
    K, ?, ?, ?, ?
  • Indications for deconfinement at high ?B
  • ? anomalous charmonium suppression ? ?
    measure D0, D -
  • J/? ? ee
  • Critical point
  • ? event-by-event fluctuations
  • ? measure p, K

offline
trigger
trigger ondisplaced vertex
offline
drives FEE/DAQarchitecture
trigger
trigger
trigger on high pt e - e- pair
offline
10
Open Charm Detection
  • Example D0 ? K-? (3.9 c? 124.4 ?m)
  • reconstruct tracks
  • find primary vertex
  • find displaced tracks
  • find secondary vertex

target
few 100 µm
5 cm
  • high selectivity because combinatorics is reduced

first two planesof vertex detector
11
CBM DAQ Requirements Profile
  • D and J/? signal drives the rate capability
    requirements
  • D signal drives FEE and DAQ/Trigger requirements
  • Problem similar to B detection, like in LHCb or
    BTeV (rip)
  • Adopted approach
  • displaced vertex 'trigger' in first level, like
    in BTeV (rip)
  • Additional Problem
  • DC beam ? interactions at random times
  • ? time stamps with ns precision needed
  • ? explicit event association needed
  • Current design for FEE and DAQ/Trigger
  • Self-triggered FEE
  • Data-push architecture

12
Conventional FEE-DAQ-Trigger Layout
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
Dedicated connections
FEE
Cave
Limited capacity
Shack
L1 Accept
DAQ
Modest bandwidth
L2 Trigger
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Standard hardware
Archive
13
Limits of Conventional Architecture
Decision time for first level trigger
limited. typ. max. latency 4 µs for LHC
Not suitable for complex global triggers like
secondary vertex search
Only especially instrumented detectors can
contribute to first level trigger
Limits future trigger development
Large variety of very specific trigger hardware
High development cost
14
The way out .. use Data Push Architecture
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
fclock
Dedicated connections
FEE
Timedistribution
Cave
Limited capacity
Shack
L1 Accept
DAQ
High bandwidth
Modest bandwidth
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Special hardware
Standard hardware
Archive
15
The way out ... use Data Push Architecture
Detector
fclock
FEE
Cave
Shack
DAQ
High bandwidth
Special hardware
Archive
16
The way out ... use Data Push Architecture
Detector
Self-triggered front-end Autonomous hit detection
fclock
FEE
No dedicated trigger connectivity All detectors
can contribute to L1
Cave
Shack
DAQ
Large buffer depth available System is
throughput-limited and not latency-limited
High bandwidth
Modular design Few multi-purpose rather many
special-purpose modules
Special hardware
Use term Event Selection
Archive
17
Front-End for Data Push Architecture
  • Each channel detects autonomously all hits
  • An absolute time stamp, precise to a fraction of
    the sampling period, is associated with each hit
  • All hits are shipped to the next layer (usually
    concentrators)
  • Association of hits with events done later using
    time correlation
  • Typical Parameters
  • with few 1 occupancy and 107 interaction rate
  • some 100 kHz channel hit rate
  • few MByte/sec per channel
  • whole CBM detector 1 Tbyte/sec

18
Typical Self-Triggered Front-End
Use sampling ADC on each detector channel running
with appropriate clock
  • Average 10 MHz interaction rate
  • Not periodic like in collider
  • On average 100 ns event spacing

a 126 t 5.6
a 114 t 22.2
amplitude
Time is determined to a fraction of the sampling
period
100
threshold
50
time
0
5
10
15
20
25
30
19
Toward Multi-Purpose FEE Chain
preFilter
digital Filter
Hit Finder
Backend Driver
PreAmp
ADC
  • Pad
  • GEM's
  • PMT
  • APD's

Anti-AliasingFilter
Sample rate 10-100 MHz Dyn. range 8...12 bit
'Shaping' 1/t Tailcancellation Baselinerestorer
Hit parameter estimators Amplitude Time
Clustering Buffering Link protocol
see talk V. Lindenstruthsee talk L. Musa
All potentially in one mixed-signal chip
20
CBM DAQ and Online Event Selection
  • More than 50 of total data volume relevant for
    first level event selection
  • Aim for simplicity
  • Ansatz
  • do (almost) all processing done after the build
    stage
  • Simple two layer approach
  • 1. event building
  • 2. event processing
  • Other scenarios are possible, putting more
    emphasis on
  • do all processing as early as possible
  • transfer data only then necessary

neededfor D
neededfor J/µ
usefullfor J/µ
STS, TRD, and ECAL data usedin first level event
selection
21
Logical Data Flow
Concentratorsmultiplex channelsto high-speed
links
Time distribution
Buffers
Build Network
Processing resources forfirst level event
selectionstructured in small farms
Connection to'high level' selection processing
22
Bandwidth Requirements
Data flow 1 TB/sec
Gilder helps
Moore helps
1st level selection 1014-15 operation/sec
100 Sub-Farms
Data flow few 10 GB/sec
to archive few 1 GB/sec
23
Focus on CNet
24
Self-Triggered FEE Output Format I
FEE
Output of a FEE chipis a list of hits Each hit
has a timestampplus other information
Output of asingleFEE chip
17 15 ... 68 34 ... 134 18 ... 135 19 ... 123
4 33 ...
TimeStamp
Channeladdress
other valuesamplitudespulse shape
!! Time Stamp values can increase forever !! ?
How to express absolute time efficiently ?
25
Handle the infinite Time Axis
1. Subdivide Time in Epochs
2. Express a timerelative to an epoch
practical epochlength about 10 µs
3. Introduce Epoch Markers
Epoch 1
Epoch 2
Epoch 3
Epoch 4
(2, 137 ns)
(3, 314 ns)
Time
A Hit
An EpochMarker
26
Self-Triggered FEE Output Format II
Output of a FEE chipis a list of hits andepoch
markers Each hit has a timestampplus other
information
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
Hit
EpochMarker
Hit with effective timestamp (3, 258)
Recordtype
27
Self-Triggered FEE Concentrators
M 1 H 18 2007 ... M 2 H 589 2134 ... M 3 H 258 271
4 ...
time
address
FEE
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
M 1 H 17 15 ... H 18 2007 ... H 68 34 ... H 134 18
... H 135 19 ... H 1234 33 ... M 2 H 589 2134 ...
M 3 H 258 19 ... H 258 2714 ...
Seems prudentto keep dataalways sortedin time
A concentrator mergesthe data streams
andeliminates redundantepoch markers
28
FEE Data Clusters I
  • In many subsystems a particle causes correlated
    hits in physically neighboring detector cells
    (STS, TRD, ECAL)
  • Depending on detector subsystem
  • the cluster pattern is 1d or 2d
  • contained in one FEE chip or not
  • examples in CBM
  • STS-MAPS 2d contained
  • STS-Strip 1d mostly contained
  • TRD 1d mostly contained to 2d often
    uncontained depending on pad geometry (varies
    inside?outside)
  • RPC t.b.d.
  • ECAL 2d many uncontained

Note for 2d a 16(64) channel chip has ¾(½) of
channels on perimeter !
29
FEE Data Clusters II
  • Usually one wants to read very low amplitude hits
    in the tail of a cluster
  • low channel hit threshold might give to much
    noise
  • ? read only low amplitude hit if in neighborhood
    of a big one
  • ? how to handle clusters crossing a chip border ?
  • use two thresholds
  • high threshold determines particle hit and region
    of interest
  • RoI communicated to all relevant neighbors
  • low amplitude hits in RoI are validated and send
  • ? this implies cross communication on CNet
    between FEE chips...

Better named FNet
If RoI are communicated, CNet becomes a real
network !!
see talk V. Lindenstruthsee talk L. Musa
30
Focus on BNet
31
Event Building Alternatives
  • Straight event-by-event approach
  • data arrives on 1000 links
  • 100 byte per event and link
  • 1010 packets/sec to handle...
  • Handle time intervals or event intervals
  • 10 µs or 100 events seems reasonable
  • Very regular and fully controlled traffic
    pattern
  • data traffic can be scheduled to avoid network
    congestion
  • a large fraction of the switch bandwidth can be
    used

32
Networking I
  • High-speed networking
  • high density connectors
  • 2.5 Gbps SerDes now 100 mW
  • 480 Gbps InfiniBand switch on one chip
  • DDR and QDR link speeds will come
  • just wait and see
  • Mellanox MTA4739624 port InfiniBand switch
  • 4x ports, 1 Gbyte/sec per port
  • ? 96 x 2.5 Gbps SERDES
  • 480 Gbps aggregate B/W
  • Single chip implementation
  • 961 ball BGA
  • 18 W power dissipation
  • Double data rate version (5 Gbps per link) in
    pipe....

33
Networking II
  • TODAY
  • Voltaire ISR 9288 switch
  • 288 4x ports non-blocking
  • cost today 120 kEUR (or 400 EUR/port)
  • 288 GByte/sec switching bandwidth
  • likely in a few years
  • 288 4x port QDR
  • likely same or lower cost
  • 1152 GByte/sec switching speeds
  • adequate for CBM...
  • Conclusion
  • BNet switch is not a major issue

34
Focus on PNet
35
Network Characteristics
Data PushDatagram'serrors markedbut not
recovered
Request/Responseand Data PushTransactionserrors
recovered
36
L1 Event Selection Farm Layout
  • Current working hypothesis CPU FPGA hybrid
    system (proviso follows)
  • Use programmable logic for cores of algorithms
  • Use CPU for the non-parallelizable parts
  • Use serial connection fabric (links and switches)
  • Modular design (only few board types)

FPGA
37
Network Summary
  • 5 different networks with very different
    characteristics
  • CNet
  • medium distance, short messages, special
    requirements
  • connects custom components (FEE ASICs)
  • TNet
  • broadcast time (and tags), special requirements
  • BNet
  • naturally large messages, Rack-2-Rack
  • PNet
  • short distance, most efficient if already
    'build-in'
  • connects standard components (FPGA, SoCs)
  • HNet
  • general purpose, to rest of world

FEE Interfaces and CNet will be co-developed.
Depends on clock/time distribution is done
Custom
Potentially build with CNet components
Custom
Probably uncritical
Ethernet, Infiniband,...
Look at emerging technologiesStay open for
changes and surprisesCost efficiency is key here
!!
PCIe,ASI,....
Whatever the implementation is, it will be
called Ethernet...
Ethernet
38
Algorithms
  • Performance of L1 feature extraction algorithms
    is essential
  • critical in CBM STS tracking vertex
    reconstruction TRD
    tracking and Pid
  • Look for algorithms which allow massive parallel
    implementation
  • Hough Transform Trackerneeds lots of bit level
    operations, well suited for FPGA
  • Cellular Automaton tracker
  • Other approaches to be evaluated
  • Co-develop tracking detectors and analysis
    algorithms
  • L1 tracking is necessarily speed optimized? more
    detector granularity and redundancy needed
  • Aim for CBMValidate final hardware design with
    at least 2 trackers suitable for L1

39
Algorithms an Example
  • Hough Transform
  • assume track comes from (close to) primary vertex
  • map each measurement into 'Hough space'
  • a peak in Hough space indicates a real track
  • is a 'global' method
  • needs substantial amount of calculation to fill
    and analyze the histograms
  • Many, but very simple operations
  • allows massively parallel implementation

40
Hough-Transform Implementation
41
Hough-Transform Implementation
Very suitable forimplementation inprogrammable
logic (FPGA's)
Other track finderapproaches, likecellular
automatatracker, also underinvestigation
42
Interim Summary
  • Event definition has changed
  • now based on time stamps and time correlation
  • Role of DAQ has changed
  • DAQ is simply responsible to transport data from
    producers to consumers
  • Role of 'Trigger' has changed
  • filter events delivered by DAQ
  • 'Online Event Selection' is better term
  • System aspects
  • 'online' 'offline' boundary blurs
  • more COTS (commercial off the shelf) components
  • much more modular system
  • much more adaptable system
  • This is emerging technology in HEP, though
    baseline for ILCHowever being used since many
    years in nuclear structure

43
Moore quo vadis ?
  • Will price/performance of computing continue to
    improve ?
  • What are limits of CMOS technology ?
  • Where are the markets ? What are market forces ?
  • Technology
  • most of the gain comes from architecture anyway
  • conventional designs, especially x86, reach their
    limits
  • Markets
  • end of the metal-box PC age ? Laptops PDA
    all kind of dedicated boxes (Video, Games)
  • end of the binary compatibility age ?
    intermediate code 'Just in Time' Compilers
    (JIT)

There is life after Intel x86A lot of
architectural innovation ahead
44
BlueGene vs Cell Processor
BlueGene121 mm2 130 nm2.8/5.6 DP GFlop
STI Cell221 mm2 90 nm256 SP GFlop 30 DP
GFlop 25 GB/sec mem 78 GB/sec IO
Finally presentedon ISSCC 2005
SPE Synergistic Processing Element
International Solid-State Circuit Conf.
45
BlueGene vs Cell Processor
Developed by IBMMarket national security
science Budget 100 M
Developed bySony, Toshiba and IBMMarket
VIDEOGAMESBudget 500 M
High performance computing is driven now by
embedded systems(games, video, ....) ?
Science is a spin-off, at best ...
46
STI Cell Processor
  • 'normal' PowerPC CPU
  • 8 Synergistic Processing Element (SPE) each with
  • 258 kB memory
  • 128 x 128 bit registers
  • 4 SP floating point units
  • own instruction stream
  • 32 multiply/add per clock cycle
  • runs at gt 4 GHz

221 mm2 die sizein 90 nm
47
Game Processors as Supercomputers ?
Slide from CHEP'04 Dave McQueeneyIBM CTO US
Federal
48
CPU and FPGA paradigms merge
Conventional CPU
SIMD (single instruction multiple data) CPU
Register
Wide Register
Control
Control
ALU
ALU
ALU
ALU
ALU
Configurable Instruction Set CPU
Wide Register
arithmeticresources
ALU
ALU
ALU
ALU
ALU
ALU
Control
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
configurableconnectionfabric
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
49
Configurable Instruction Set Processor
  • Example Stretch S5xxx
  • Hybrid design
  • conventional fixed instruction set part
  • plus configurable instruction set part
  • C/C compiler analyses the kernel of algorithms
  • generates custom instruction set
  • generates code to use it
  • The promise
  • easy of use of C/C
  • performance of an FPGA

Stretch S5 engine
Fabric is the keyword
interconnected resources
from Stretch Inc. product brief
50
CPU and FPGA paradigms merge
CPU
Processorindustryworld view
A lot of innovation in the years to
come Essential will be availability of
efficient development tools
configurablelogic
configurablelogic
FPGAindustryworld view
Moore will go on ! There are the technologies
There are the markets Architectural changes ahead
CPU
CPU
51
Summary
Substantial RD needed
  • Self-triggered FEE
  • autonomous hit detection, time-stamping with ns
    presision
  • sparsification, hit buffering, high output
    bandwidth
  • High bandwidth event building network
  • handle 10 MHz interaction rate in Au-Au
  • also cope with few 100 MHz interaction rate in
    p-p, p-A
  • likely be done in time slices or event slices
  • L1 processor farm
  • feasible with PC FPGA Moore (needed 2014)
  • but look beyond todays PC's and FPGA's
  • Efficient algorithms (109 tracks/sec)
  • co-design of critical detectors and tracking
    software

Quitedifferentfrom thecurrentLHC
styleelectronics
RII3-CT-2004-506078
52
The End
Thanks for your attention
53
CBM Collaboration 41 institutions, 15 countries
China Hua-Zhong Univ., Wuhan Croatia RBI,
Zagreb Cyprus Nikosia Univ. Czech
Republic Czech Acad. Science, Rez Techn. Univ.
Prague   France IReS Strasbourg Germany
Univ. Heidelberg, Phys. Inst. Univ. HD,
Kirchhoff Inst. Univ. Frankfurt Univ.
Kaiserslautern Univ. Mannheim Univ.
Marburg Univ. Münster FZ Rossendorf GSI Darmstadt
Russia CKBM, St. Petersburg IHEP Protvino INR
Troitzk ITEP Moscow KRI, St. Petersburg Kurchatov
Inst., Moscow LHE, JINR Dubna LPP, JINR
Dubna LIT, JINR Dubna MEPhi, Moskau Obninsk State
Univ. PNPI Gatchina SINP, Moscow State Univ. St.
Petersburg Polytec. U. Spain Santiago de
Compostela Uni. Ukraine Shevshenko Univ. ,
Kiev
Hungaria KFKI Budapest Eötvös Univ.
Budapest Korea Korea Univ. Seoul Pusan National
Univ. Norway Univ. Bergen Poland Krakow
Univ. Warsaw Univ. Silesia Univ.
Katowice   Portugal LIP Coimbra Romania NIPNE
Bucharest
54
FPGA Basic Building Block
CLB Configurable Logic Block
CLB
X
F0
XQ
D
Q
F1
LUT
F2
C
F3
CLK
Elementarystorage unit
Universallogic gate
Look-up Tablejust a 4x1 RAM
D Flip-Flop

55
FPGA Putting it together
CLB
CLB
CLB
CLB
ConfigurableLogic Block
PSM
PSM
PSM
Wiring
CLB
CLB
CLB
CLB
Programmableswitch matrix
PSM
PSM
PSM
I/O blocks
CLB
CLB
CLB
CLB
PSM
PSM
PSM
Modern FPGA'sgt100.000 LUT 500 MHz
CLB
CLB
CLB
CLB
Write a Comment
User Comments (0)
About PowerShow.com