CBM DAQ and Event Selection

About This Presentation

Title:

CBM DAQ and Event Selection

Description:

Topical Workshop: Advanced Instrumentation for Future Accelerator Experiments, ... Current working hypothesis: CPU FPGA hybrid system (proviso follows) ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 56

Provided by: walt133

Category:

more less

Transcript and Presenter's Notes

Title: CBM DAQ and Event Selection

1
CBM DAQ and Event Selection

Walter F.J. Müller, GSI, Darmstadt
for the CBM Collaboration
Topical Workshop Advanced Instrumentation for
Future Accelerator Experiments, Bergen, Norway,
4-6 April 2005

2
Outline

CBM (very briefly)
observables
setup
FEE/DAQ/Trigger
requirements
challenges
strategies

3
CBM at FAIR
SIS 100 Tm SIS 300 Tm U 35 AGeV p 90 GeV
Compressed Baryonic MatterExperiment
4
CBM Physics Topics and Observables

In-medium modifications of hadrons
? onset of chiral symmetry restoration at high
?B ? measure ?, ?, ? ? ee- (µ µ-)
open charm D0, D
Strangeness in matter
? enhanced strangeness production ? measure
K, ?, ?, ?, ?
Indications for deconfinement at high ?B
? anomalous charmonium suppression ? ?
measure D0, D
J/? ? ee- (µ µ-)
Critical point
? event-by-event fluctuations
? measure p, K

Good e/p separation
Vertex detector
Low cross sections? High interaction rates?
Selective Triggers
Hadron identification
5
CBM Setup
? Radiation hard Silicon pixel/strip detectors in
a magnetic dipole field ? Electron detectors
RICH TRD ECAL pion suppression up to 105 ?
Hadron identification RPC, RICH ? Measurement
of photons, p0, ?, and muons ECAL
6
CBM and HADES
All you want to know about CBMTechnical Status
Report (400 p)now available under http//www.gsi.
de/documents/DOC-2005-Feb-447-1.pdf
7
Meson Production in central AuAu
W. Cassing, E. Bratkovskaya, A. Sibirtsev, Nucl.
Phys. A 691 (2001) 745

10 MHz interaction rateneeded for 10-15 A GeV
SIS300
8
A Typical AuAu Collision
Central AuAu collision at 25 AGeV URQMD
GEANT 160 p 170 n 360 ?-
330 ? 360 ?0 41 K 13 K-
42 K0
? 107 AuAu interactions/sec ? 109
tracks/sec to reconstruct for first level event
selection
9
CBM Trigger Requirements
assume archive rate few GB/sec 20 kevents/sec

In-medium modifications of hadrons
? onset of chiral symmetry restoration at high
?B ? measure ?, ?, ? ? ee-
open charm (D0, D)
Strangeness in matter
? enhanced strangeness production ? measure
K, ?, ?, ?, ?
Indications for deconfinement at high ?B
? anomalous charmonium suppression ? ?
measure D0, D -
J/? ? ee
Critical point
? event-by-event fluctuations
? measure p, K

offline
trigger
trigger ondisplaced vertex
offline
drives FEE/DAQarchitecture
trigger
trigger
trigger on high pt e - e- pair
offline
10
Open Charm Detection

Example D0 ? K-? (3.9 c? 124.4 ?m)
reconstruct tracks
find primary vertex
find displaced tracks
find secondary vertex

target
few 100 µm
5 cm

high selectivity because combinatorics is reduced

first two planesof vertex detector
11
CBM DAQ Requirements Profile

D and J/? signal drives the rate capability
requirements
D signal drives FEE and DAQ/Trigger requirements
Problem similar to B detection, like in LHCb or
BTeV (rip)
Adopted approach
displaced vertex 'trigger' in first level, like
in BTeV (rip)
Additional Problem
DC beam ? interactions at random times
? time stamps with ns precision needed
? explicit event association needed
Current design for FEE and DAQ/Trigger
Self-triggered FEE
Data-push architecture

12
Conventional FEE-DAQ-Trigger Layout
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
Dedicated connections
FEE
Cave
Limited capacity
Shack
L1 Accept
DAQ
Modest bandwidth
L2 Trigger
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Standard hardware
Archive
13
Limits of Conventional Architecture
Decision time for first level trigger
limited. typ. max. latency 4 µs for LHC
Not suitable for complex global triggers like
secondary vertex search
Only especially instrumented detectors can
contribute to first level trigger
Limits future trigger development
Large variety of very specific trigger hardware
High development cost
14
The way out .. use Data Push Architecture
Especially instrumented detectors
Detector
L0 Trigger
fbunch
Trigger Primitives
fclock
Dedicated connections
FEE
Timedistribution
Cave
Limited capacity
Shack
L1 Accept
DAQ
High bandwidth
Modest bandwidth
L1 Trigger
Limited L1 trigger latency
Specialized trigger hardware
Special hardware
Standard hardware
Archive
15
The way out ... use Data Push Architecture
Detector
fclock
FEE
Cave
Shack
DAQ
High bandwidth
Special hardware
Archive
16
The way out ... use Data Push Architecture
Detector
Self-triggered front-end Autonomous hit detection
fclock
FEE
No dedicated trigger connectivity All detectors
can contribute to L1
Cave
Shack
DAQ
Large buffer depth available System is
throughput-limited and not latency-limited
High bandwidth
Modular design Few multi-purpose rather many
special-purpose modules
Special hardware
Use term Event Selection
Archive
17
Front-End for Data Push Architecture

Each channel detects autonomously all hits
An absolute time stamp, precise to a fraction of
the sampling period, is associated with each hit
All hits are shipped to the next layer (usually
concentrators)
Association of hits with events done later using
time correlation
Typical Parameters
with few 1 occupancy and 107 interaction rate
some 100 kHz channel hit rate
few MByte/sec per channel
whole CBM detector 1 Tbyte/sec

18
Typical Self-Triggered Front-End
Use sampling ADC on each detector channel running
with appropriate clock

Average 10 MHz interaction rate
Not periodic like in collider
On average 100 ns event spacing

a 126 t 5.6
a 114 t 22.2
amplitude
Time is determined to a fraction of the sampling
period
100
threshold
50
time
0
5
10
15
20
25
30
19
Toward Multi-Purpose FEE Chain
preFilter
digital Filter
Hit Finder
Backend Driver
PreAmp
ADC

Pad
GEM's
PMT
APD's

Anti-AliasingFilter
Sample rate 10-100 MHz Dyn. range 8...12 bit
'Shaping' 1/t Tailcancellation Baselinerestorer
Hit parameter estimators Amplitude Time
Clustering Buffering Link protocol
see talk V. Lindenstruthsee talk L. Musa
All potentially in one mixed-signal chip
20
CBM DAQ and Online Event Selection

More than 50 of total data volume relevant for
first level event selection
Aim for simplicity
Ansatz
do (almost) all processing done after the build
stage
Simple two layer approach
1. event building
2. event processing
Other scenarios are possible, putting more
emphasis on
do all processing as early as possible
transfer data only then necessary

neededfor D
neededfor J/µ
usefullfor J/µ
STS, TRD, and ECAL data usedin first level event
selection
21
Logical Data Flow
Concentratorsmultiplex channelsto high-speed
links
Time distribution
Buffers
Build Network
Processing resources forfirst level event
selectionstructured in small farms
Connection to'high level' selection processing
22
Bandwidth Requirements
Data flow 1 TB/sec
Gilder helps
Moore helps
1st level selection 1014-15 operation/sec
100 Sub-Farms
Data flow few 10 GB/sec
to archive few 1 GB/sec
23
Focus on CNet
24
Self-Triggered FEE Output Format I
FEE
Output of a FEE chipis a list of hits Each hit
has a timestampplus other information
Output of asingleFEE chip
17 15 ... 68 34 ... 134 18 ... 135 19 ... 123
4 33 ...
TimeStamp
Channeladdress
other valuesamplitudespulse shape
!! Time Stamp values can increase forever !! ?
How to express absolute time efficiently ?
25
Handle the infinite Time Axis
1. Subdivide Time in Epochs
2. Express a timerelative to an epoch
practical epochlength about 10 µs
3. Introduce Epoch Markers
Epoch 1
Epoch 2
Epoch 3
Epoch 4
(2, 137 ns)
(3, 314 ns)
Time
A Hit
An EpochMarker
26
Self-Triggered FEE Output Format II
Output of a FEE chipis a list of hits andepoch
markers Each hit has a timestampplus other
information
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
Hit
EpochMarker
Hit with effective timestamp (3, 258)
Recordtype
27
Self-Triggered FEE Concentrators
M 1 H 18 2007 ... M 2 H 589 2134 ... M 3 H 258 271
4 ...
time
address
FEE
FEE
M 1 H 17 15 ... H 68 34 ... H 134 18 ... H 135 19
... H 1234 33 ... M 2 M 3 H 258 19 ...
M 1 H 17 15 ... H 18 2007 ... H 68 34 ... H 134 18
... H 135 19 ... H 1234 33 ... M 2 H 589 2134 ...
M 3 H 258 19 ... H 258 2714 ...
Seems prudentto keep dataalways sortedin time
A concentrator mergesthe data streams
andeliminates redundantepoch markers
28
FEE Data Clusters I

In many subsystems a particle causes correlated
hits in physically neighboring detector cells
(STS, TRD, ECAL)
Depending on detector subsystem
the cluster pattern is 1d or 2d
contained in one FEE chip or not
examples in CBM
STS-MAPS 2d contained
STS-Strip 1d mostly contained
TRD 1d mostly contained to 2d often
uncontained depending on pad geometry (varies
inside?outside)
RPC t.b.d.
ECAL 2d many uncontained

Note for 2d a 16(64) channel chip has ¾(½) of
channels on perimeter !
29
FEE Data Clusters II

Usually one wants to read very low amplitude hits
in the tail of a cluster
low channel hit threshold might give to much
noise
? read only low amplitude hit if in neighborhood
of a big one
? how to handle clusters crossing a chip border ?
use two thresholds
high threshold determines particle hit and region
of interest
RoI communicated to all relevant neighbors
low amplitude hits in RoI are validated and send
? this implies cross communication on CNet
between FEE chips...

Better named FNet
If RoI are communicated, CNet becomes a real
network !!
see talk V. Lindenstruthsee talk L. Musa
30
Focus on BNet
31
Event Building Alternatives

Straight event-by-event approach
data arrives on 1000 links
100 byte per event and link
1010 packets/sec to handle...
Handle time intervals or event intervals
10 µs or 100 events seems reasonable
Very regular and fully controlled traffic
pattern
data traffic can be scheduled to avoid network
congestion
a large fraction of the switch bandwidth can be
used

32
Networking I

High-speed networking
high density connectors
2.5 Gbps SerDes now 100 mW
480 Gbps InfiniBand switch on one chip
DDR and QDR link speeds will come
just wait and see

Mellanox MTA4739624 port InfiniBand switch
4x ports, 1 Gbyte/sec per port
? 96 x 2.5 Gbps SERDES
480 Gbps aggregate B/W
Single chip implementation
961 ball BGA
18 W power dissipation
Double data rate version (5 Gbps per link) in
pipe....

33
Networking II

TODAY
Voltaire ISR 9288 switch
288 4x ports non-blocking
cost today 120 kEUR (or 400 EUR/port)
288 GByte/sec switching bandwidth
likely in a few years
288 4x port QDR
likely same or lower cost
1152 GByte/sec switching speeds
adequate for CBM...
Conclusion
BNet switch is not a major issue

34
Focus on PNet
35
Network Characteristics
Data PushDatagram'serrors markedbut not
recovered
Request/Responseand Data PushTransactionserrors
recovered
36
L1 Event Selection Farm Layout

Current working hypothesis CPU FPGA hybrid
system (proviso follows)
Use programmable logic for cores of algorithms
Use CPU for the non-parallelizable parts
Use serial connection fabric (links and switches)
Modular design (only few board types)

FPGA
37
Network Summary

5 different networks with very different
characteristics
CNet
medium distance, short messages, special
requirements
connects custom components (FEE ASICs)
TNet
broadcast time (and tags), special requirements
BNet
naturally large messages, Rack-2-Rack
PNet
short distance, most efficient if already
'build-in'
connects standard components (FPGA, SoCs)
HNet
general purpose, to rest of world

FEE Interfaces and CNet will be co-developed.
Depends on clock/time distribution is done
Custom
Potentially build with CNet components
Custom
Probably uncritical
Ethernet, Infiniband,...
Look at emerging technologiesStay open for
changes and surprisesCost efficiency is key here
!!
PCIe,ASI,....
Whatever the implementation is, it will be
called Ethernet...
Ethernet
38
Algorithms

Performance of L1 feature extraction algorithms
is essential
critical in CBM STS tracking vertex
reconstruction TRD
tracking and Pid
Look for algorithms which allow massive parallel
implementation
Hough Transform Trackerneeds lots of bit level
operations, well suited for FPGA
Cellular Automaton tracker
Other approaches to be evaluated
Co-develop tracking detectors and analysis
algorithms
L1 tracking is necessarily speed optimized? more
detector granularity and redundancy needed
Aim for CBMValidate final hardware design with
at least 2 trackers suitable for L1

39
Algorithms an Example

Hough Transform
assume track comes from (close to) primary vertex
map each measurement into 'Hough space'
a peak in Hough space indicates a real track
is a 'global' method
needs substantial amount of calculation to fill
and analyze the histograms
Many, but very simple operations
allows massively parallel implementation

40
Hough-Transform Implementation
41
Hough-Transform Implementation
Very suitable forimplementation inprogrammable
logic (FPGA's)
Other track finderapproaches, likecellular
automatatracker, also underinvestigation
42
Interim Summary

Event definition has changed
now based on time stamps and time correlation
Role of DAQ has changed
DAQ is simply responsible to transport data from
producers to consumers
Role of 'Trigger' has changed
filter events delivered by DAQ
'Online Event Selection' is better term
System aspects
'online' 'offline' boundary blurs
more COTS (commercial off the shelf) components
much more modular system
much more adaptable system
This is emerging technology in HEP, though
baseline for ILCHowever being used since many
years in nuclear structure

43
Moore quo vadis ?

Will price/performance of computing continue to
improve ?
What are limits of CMOS technology ?
Where are the markets ? What are market forces ?
Technology
most of the gain comes from architecture anyway
conventional designs, especially x86, reach their
limits
Markets
end of the metal-box PC age ? Laptops PDA
all kind of dedicated boxes (Video, Games)
end of the binary compatibility age ?
intermediate code 'Just in Time' Compilers
(JIT)

There is life after Intel x86A lot of
architectural innovation ahead
44
BlueGene vs Cell Processor
BlueGene121 mm2 130 nm2.8/5.6 DP GFlop
STI Cell221 mm2 90 nm256 SP GFlop 30 DP
GFlop 25 GB/sec mem 78 GB/sec IO
Finally presentedon ISSCC 2005
SPE Synergistic Processing Element
International Solid-State Circuit Conf.
45
BlueGene vs Cell Processor
Developed by IBMMarket national security
science Budget 100 M
Developed bySony, Toshiba and IBMMarket
VIDEOGAMESBudget 500 M
High performance computing is driven now by
embedded systems(games, video, ....) ?
Science is a spin-off, at best ...
46
STI Cell Processor

'normal' PowerPC CPU
8 Synergistic Processing Element (SPE) each with
258 kB memory
128 x 128 bit registers
4 SP floating point units
own instruction stream
32 multiply/add per clock cycle
runs at gt 4 GHz

221 mm2 die sizein 90 nm
47
Game Processors as Supercomputers ?
Slide from CHEP'04 Dave McQueeneyIBM CTO US
Federal
48
CPU and FPGA paradigms merge
Conventional CPU
SIMD (single instruction multiple data) CPU
Register
Wide Register
Control
Control
ALU
ALU
ALU
ALU
ALU
Configurable Instruction Set CPU
Wide Register
arithmeticresources
ALU
ALU
ALU
ALU
ALU
ALU
Control
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
configurableconnectionfabric
PSM
PSM
PSM
PSM
PSM
ALU
ALU
ALU
ALU
ALU
ALU
49
Configurable Instruction Set Processor

Example Stretch S5xxx
Hybrid design
conventional fixed instruction set part
plus configurable instruction set part
C/C compiler analyses the kernel of algorithms
generates custom instruction set
generates code to use it
The promise
easy of use of C/C
performance of an FPGA

Stretch S5 engine
Fabric is the keyword
interconnected resources
from Stretch Inc. product brief
50
CPU and FPGA paradigms merge
CPU
Processorindustryworld view
A lot of innovation in the years to
come Essential will be availability of
efficient development tools
configurablelogic
configurablelogic
FPGAindustryworld view
Moore will go on ! There are the technologies
There are the markets Architectural changes ahead
CPU
CPU
51
Summary
Substantial RD needed

Self-triggered FEE
autonomous hit detection, time-stamping with ns
presision
sparsification, hit buffering, high output
bandwidth
High bandwidth event building network
handle 10 MHz interaction rate in Au-Au
also cope with few 100 MHz interaction rate in
p-p, p-A
likely be done in time slices or event slices
L1 processor farm
feasible with PC FPGA Moore (needed 2014)
but look beyond todays PC's and FPGA's
Efficient algorithms (109 tracks/sec)
co-design of critical detectors and tracking
software

Quitedifferentfrom thecurrentLHC
styleelectronics
RII3-CT-2004-506078
52
The End
Thanks for your attention
53
CBM Collaboration 41 institutions, 15 countries
China Hua-Zhong Univ., Wuhan Croatia RBI,
Zagreb Cyprus Nikosia Univ. Czech
Republic Czech Acad. Science, Rez Techn. Univ.
Prague France IReS Strasbourg Germany
Univ. Heidelberg, Phys. Inst. Univ. HD,
Kirchhoff Inst. Univ. Frankfurt Univ.
Kaiserslautern Univ. Mannheim Univ.
Marburg Univ. Münster FZ Rossendorf GSI Darmstadt
Russia CKBM, St. Petersburg IHEP Protvino INR
Troitzk ITEP Moscow KRI, St. Petersburg Kurchatov
Inst., Moscow LHE, JINR Dubna LPP, JINR
Dubna LIT, JINR Dubna MEPhi, Moskau Obninsk State
Univ. PNPI Gatchina SINP, Moscow State Univ. St.
Petersburg Polytec. U. Spain Santiago de
Compostela Uni. Ukraine Shevshenko Univ. ,
Kiev
Hungaria KFKI Budapest Eötvös Univ.
Budapest Korea Korea Univ. Seoul Pusan National
Univ. Norway Univ. Bergen Poland Krakow
Univ. Warsaw Univ. Silesia Univ.
Katowice Portugal LIP Coimbra Romania NIPNE
Bucharest
54
FPGA Basic Building Block
CLB Configurable Logic Block
CLB
X
F0
XQ
D
Q
F1
LUT
F2
C
F3
CLK
Elementarystorage unit
Universallogic gate
Look-up Tablejust a 4x1 RAM
D Flip-Flop

55
FPGA Putting it together
CLB
CLB
CLB
CLB
ConfigurableLogic Block
PSM
PSM
PSM
Wiring
CLB
CLB
CLB
CLB
Programmableswitch matrix
PSM
PSM
PSM
I/O blocks
CLB
CLB
CLB
CLB
PSM
PSM
PSM
Modern FPGA'sgt100.000 LUT 500 MHz
CLB
CLB
CLB
CLB

Write a Comment

User Comments (0)