Title: A TCP/IP transport layer for the DAQ of the CMS Experiment
1A TCP/IP transport layer for the DAQ of the CMS
Experiment
Miklos Kozlovszky for the CMS TriDAS
collaboration CERNEuropean Organization for
Nuclear Research
ACAT03 - December 2003
2CMS Data Acquisition
CMS
Data
Data
3Building the events
Event builder Physical system interconnecting
data sources with data destinations. It has to
move each event data fragments into a same
destination
Event fragments Event data fragments are
stored in separated physical memory systems
2
1
3
3
1
2
512
512
Full events Full event data are stored into
one physical memory system associated to a
processing unit
512 Data sources for 1 MByte events 1000s HTL
processing nodes
4XDAQ Framework
- Distributed DAQ framework developed within CMS.
- Construct homogeneous applications for
heterogeneous processing clusters. - Multi-threaded (important to take advantage of
SMP efficiently). - Zero copy message passing for the event data.
- Peer to peer communication between the
applications. - I2O for data transport, and SOAP for
configuration and control. - Hardware and transport independency.
Subject of presentation
5TCP/IP Peer Transport Requirements
- Reuse old, cheap Ethernet for DAQ
- Transport layer requirements
- Reliable communication
- Hide the complexity of TCP
- Efficient implementation
- Simplex communication via sockets
- Configurable
- Support of blocking and non-blocking I/O
6Implementation of the non-blocking mode
- Pending Queues
- Thread safe PQ management
- One PQ for each destination
- Independent sending through sockets
- Only one Select function call both to receive
the packet and send the blocked data.
7Communication via the transport layer
8Throughput optimisation
- Operating System tuning (kernel optionsbuffers)
- Jumbo Frames
- Transport protocol options
- Communication techniques
- Blocking vs. Non-Blocking I/O
- Single/Multi-rail
- Single/Multi-thread
- TCP options (e.g.Nagle algorithm)
- .
Single rail
Multi-rail
App 1
App 2
9Test network
Cluster size 8x8 CPU 2x Intel Xeon (2.4
GHz), 512KB Cache I/O system PCI-X 4 buses (max
6) . Memory Two-way interleaved DDR 3.2 GB/s
(512 MB) NICs 1 Intel 82540EM GE 1
Broadcom NeXtreme BCM 5703x GE 1 Intel Pro
2546EB GE (2port) OS Linux RedHat 2.4.18-27.7
(SMP) Switches 1 BATM- T6 Multi Layer Gigabit
Switch (medium range) 2 Dell Power Connect 5224
(medium range)
10Event Building on the cluster
- Conditions
- XDAQEvent Builder
- No Readout Unit inputs
- No Builder Unit outputs
- No Event Manager
- PC dual P4 Xeon
- Linux 2.4.19
- NIC e-1000
- Switch Powerconnect 5224
- Standard MTU (1500 Bytes)
- Each BU builds 128 events
- Fixed fragment sizes
- Result
- For fragment size gt 4 kB
- Thru /node 100 MB/s i.e. 80 utilisation
11Two Rail Event Builder measurements
- Test case
- Bare Event Builder (2x2)
- No RU inputs
- No BU outputs
- No Event Manager
- Options
- Non blocking TCP
- Jumbo frames (mtu 8000)
- Two rail
- One thread
- RU working point (16 kB)
- Throughput/node 240 MB/ s
- i.e. 95 bandwidth
12Conclusions
- Achieved 100 MB/s per node in 8x8 configuration
(1rail). - Improvements seen with the use of two rail,
non-blocking I/O, with Jumbo frames. In 2x2
configuration over 230 MB/s obtained. - High CPU load.
- We are also studying other networking and traffic
shaping options.