Title: PC Farms at CERN
1PC Farms at CERN
- Frédéric Hemmer
- CERN-IT/PDP
2Disclaimer
- This will cover farms which imply an involvement
of CERNs computer center. - There are other farms in strict online
environments or private farms in building.
3Overview
- Off line farms
- Linux farms
- NT farms
- Issues
- PC Technology Performance
- Online Farms quasi online farms
- Cost of ownership
- Conclusions
4Linux Farms - Nomad
- Proof of concept in Summer 97
- Straight NQS port
- SHIFT SW client port
- CERNLIB port
- NOMAD observed a quasi linearity with clock
frequency compared to Alphas !!! - I.e. Alpha_at_266 MHz PII_at_266 MHz
- Now 17 PCs dual, 3 types of MB
5Linux Farms - NA49
- NA49 already deployed privately a PC farm in
their premises - Request a new farm to be deployed in order to
benefit from the computer center infrastructure
(people and equipment ) in 1 H98 - Trivial deployment, running with NQS
- Most PCs are branded PCs (HP)
- Now completely off RISC for CPU
- 18 DUALS _at_ 300-gt400 MHz
6NA49 Analysis - data access
HiPPI
600 GB 1 Run
100BT
From experiment 10-12 TB / month 1
month/year Manual Feed 100 GB Cartridges
SONY DMS
7Linux Farms (NA48)
- NA48 was using the QSW CS/2 (128 proc.)
- CS/2 overload -gt investigate PCs in late 97
- Installation of 12 Dual machines in 1Q98 and more
...
8Linux Issues
- EEPRO 100 B MP crashes
- AFS support (MP)
- NFS support (MP)
- Commercial software
- Manufacturer support for Linux
- Very few Linux experts
9NT offline Farms
- PCSF
- Simulation facility but
- COMPASS
- Evaluating benchmarking technology
10PCSF - Overview
- Configuration
- Applications
- Data access
- Specific work solutions
- Key issues
- Conclusions
11PCSF - Goals
- Make PCNT a standard option for Physics Data
Processing, starting with simulation - Establish a minimum management model for NT farm
management - Address scalability issues
- Gain Windows NT experience
12PCSF Milestones
- Joined RD47 in Autumn 96
- Price inquiry issued in 12/96
- Hardware delivered 4/97
- Ready to use 6/97
- RD47 report 10/97
- Expansion 5/98
13PCSF Configuration (1)
- Server running NT 4.0 Server SP3
- 1 dual capable Ppro _at_ 200 MHz, 96 MB, with 9 GB
data disk (with mirroring). LSF central queues. - Server running NT Terminal Server Beta 2
- 1 dual Ppro _at_ 200 MHz, 128 MB, with 4 GB data
disk. Runs IIS 3.0 and is accessible from outside
CERN. It also host the asps for Web access - Servers running NT 4.0 Workstation SP3
- 9 dual Ppros _at_ 200 MHz, 64 MB, 24GB
- 25 dual PIIs _at_ 300 MHz, 128 MB, 24GB
- All equipped with boot proms
14PCSF Configuration (2)
- Machines interconnected with 4 3com 3000 100BaseT
switch - Display/Keyboard/Mouse connected to a Raritan
multiplexor - PC Duo for remote admin access
- ? There were problems with other products
- All running LSF 3.0.
- ? LSF 3.2 does not work, support weak
- Completely integrated with NICE
15Applications on PCSF
- ATLAS Dice simulation
- NA45 1996 reconstruction
- CMS reconstruction with Objectivity being tested
- LHCB simulation code ready
- ATLAS reconstruction being ported
- ATLAS/Marseille event filter prototype
scalability tests
16Data access
RFIO
Unix Tape Server
stagexxx commands
17ATLAS Level 3 DAQ
Readout Buffers
1 GB/s
Processor Farm
Storage (100 MB/s)
18ATLAS Event Filter
- Testbed for evaluating algorithms sizing
- Architecture simulation studies
- Monitoring, system management, feedback, etc
- Interface prototypes (SFI, SFO)
- Timescale prototype -1 (I.e. end 98)
- Status sizing of an initial farm
19PCSF Usage
20(No Transcript)
21Specific work so far
- Installation (Remote Boot, Winstall, NICE
replicas, Install Server) - User codes, CERNLIB, SHIFT
- Job Starter
- PC MGR
- WNTS
- Web Interface
22Installation
- Disk cloning change SID
- ? Fastest method, but not very automated
- Remote boot
- Remote boot install procedures with virtual disk
- Use unattended setup, installs Winstall and other
things - Third party packages installed through Winstall
- ? boot prom support on some hardware
23Porting
- Usually porting code from Unix to NT is easy
(NA45 code ported in 1 week) - Usually porting production environment from Unix
to NT is difficult (shell scripts) - Porting build environment is difficult, better to
use native tools (Dev Studio) - ? Mixing Unix and NT build environment, revision
control, etc.
24Jobstarter
- Initially inherited from Unix LSF CERN JobStarter
- Rewritten in C, using PcMgrSvc for drive
mapping - Check execution preconditions
- Clean up normal and abnormal job end
- Kill popup dialog windows
- ? Excel Winzip in batch
25PcMgrSvc/Ctl
- Checks
- Status of monitored processes/services
- Amount of scratch space
- Drive mapping(s)
- Map/Unmap drives
- Sync. with time servers
- Generate alarms on request
- Gets all parameters from registry
26Web Interface
- As a solution to
- Remote access from outside CERN
- Access from non NT hosts
- Implemented as ASPs with VB
- Requires IIS on the server
27Web Interface - authentication
28Web Interface - Overview
29Web Interface - bjobs
30Web interface - bjobs result
31Windows NT Terminal Server
32Next Steps
- Finish and understand remote boot issues
- Complete remote boot - remote install
- AFS Integration
- Build up resilience
- Investigate how to use the new WfM, DMI, PXE,
ACPI, etc. initiatives - Investigate whether WSH is an alternative
- Investigate NTs I/O capabilities
33Key Issues
- AFS access
- LSF support
- Boot proms, equipment interoperability
- CODE reintegration (Physics CERNLIB)
- Think Windows
- Scalability Management (home grown solution vs.
commercial apps.) - Remote external access
34PC with NT
- PCNT has proven to work in batch environment,
and is now an option for Physics Data Processing - Farm management is less of a concern after have
built a few tools (alternatives would be to use
SMS or TNG), but some work is still needed - Scalability has started to be addressed, but the
relatively small number of nodes does not help
here - Considerable NT experience has been gained
35Issues so far
- Linux
- EEPRO 100 B MP support
- Commercial software
- Manufacturer support
- Very few local Linux experts
- NT
- AFS access
- LSF support
- Think Windows
- Remote and external access
- PC
- Interoperability (cards/MB combination
- Remote Boot support
36PC Technology evolution in 97
- Pentium Pro ? Pentium II
- 50 raw performance increase
- but 50 cache performance reduction
- SEC ? new motherboards
- 440 FX ? 440 LX (SDRAM, AGP)
- Recent MBs ? embedded SCSI, Enet, VGA
- 100 Mbit Enet switches standard, 1000 Mbit
arriving
37PC Technology evolution in 98
- Pentium II _at_300 MHz ? Pentium Xeon _at_ 450 MHz
- MP support
- 50 cache performance increase
- Slot 2 ? new motherboards
- 440 LX ? 440 BX, 440 NX (100 MHz, EDO)
- Recent MBs ? No more available through Intel,
TYAN - 1000 Mbit/s Enet switches standard, gtgt 1000
Mbit/s arriving
38Racking evolution
1998
1997
39At the back ...
40Console multiplexors
41Fast Ethernet switches (Sep. 98)
42Fast Ethernet Switches (Oct. 98)
43At the back of Fast Ethernet Switches (Oct. 98)
44Gigabit Ethernet Switches
45Network performance Results
- PCs interconnected through 100 BaseT 3Com 3000
switch - Repeated with other H/W
- Half duplex behavior
- Block size does not matter
- Linux uses less CPU than NT
- ? Good unidirectional performance
- ? Disappointing CPU consumption on NT
- ? Disappointing bi-directional performance
46PC to PC Network performance
47Network performance issues
- Unexplained 0.5 MB/s observed with some eepro100
versions on PCRD hardware, but OK on PCSF - Recent DEC E'net boards with chipset gt 21140 give
poor performance on Linux - Surprising results PC/Alpha
48PC/Alpha Network performance
49PC High Performance Networking
- HiPPI (5/98)
- PII, 300 MHz, 440LX, SDRAM, Roadrunner to SGI
O2000, 4 CPU, IRIX 6.4 - Transmit 50 MB/s
- Receive 50 MB/s (53 MB/s with SMP)
- Gigabit Ethernet (10/98)
- PII, 400 MHz, 440 BX, 100 MHz SDRAM, PCI 32/33,
Tigon I - 1500 bytes/packet 28 MB/s, 40 CPU
- 9000 bytes/packet, 90 MB/s, 90 CPU
50Disk performance
- PCs connected to SEAGATE ST19171W using two
Adaptec 2940 UW - NT needs a lot of tuning (default behavior is to
swap data out!) - Block size, BIOS settings, EDO/FPM does not
matter - ? Poor performance
- ? Windows NT even worse
- ? Memory bandwidth is suspected
51Disk performance
- Striping has no effect
- 1 stream 2 stripes 21 MB/s (22 max)
- 1 stream 3 stripes 21 MB/s (33 max)
52Disk performance issues
- Memory bandwidth suspected
- Need to test with LX/SDRAM, BX SDRAM_at_100 Mhz
- RISC PCI does not support variety of boards
- Combined disk/network performance even worse
5-6 MB/s on Linux
53Memory bandwidth (lmbench)
54Memory bandwidth (lmbench)
55Technology issues
- Technology evolves too fast (processors,
chipsets, memory, motherboards, networking,...) - Changing environment/interoperability issues
- Hard to maintain (obsolescence)
- New NICs, drivers
- Measurements valid only a few months
- ? Difficult to establish stable environments
- Wide variety of solutions
- ? Some combinations work, other not
- Local suppliers cannot help to solve problems
56PC Performance summary
- CPU performance fine
- Network performance
- Some configurations do not work
- Some configurations can saturate Fast Ethernet
- Recent tests show excellent performance
- Memory performance
- Now better than low-end RISC
- Disk Performance disappointing
- Linux better than NT
57Online and quasi online farms
- NA48 Data Recording
- NA45 Data Recording in Objectivity
58NA48 Central Data Recording
Sub detector VME crates
Event Builder Online PC Farm
FDDI
Fast Ethernet
SUN E450 500 GB Disk space
XLNT Gbit
Fast Ethernet
7 KM
Gigabit Ethernet
3Com 9300
GigaRouter
HiPPI
HiPPI
FDDI
Offline PC Farm
CS/2 2.5 TB Disk space
59NA 48 Data Recording in 98
- May ? September 1998
- Raw Data on Tape
- 68 TB (1450 tapes, mainly 50 GB tapes)
- 12.5 TB Selected Reconstructed Data
- Total with 97 data 96 TB
- Average Data Rate 18 MB/s (peaks _at_ 23 MB/s)
- CDR system can do 40-50 MB/s limitation is CPU
Time available - Data recorded as files (4 million)
60NA48 On Line Farm
- 11 Subdetector PCs (dual PII-266, 128 MB)
- 8 Event Building PCs (dual PII-266, 128 MB, 18
GB SCSI) - 4 CDR routing PCs (dual PII-266, 64 MB, FDDI)
- All running Linux
- Software event building in the interburst gap
- Optional Software Filter (tags data)
- Send data to computer center (local disk buffers
144 GB , 2 hours) - On CS/2 L3 Filtering and tape writing
61NA48 Plans for 1999
Sub detector VME crates
Event Builder
4 SUN E450 4.5 TB Disk space
7 KM
Gigabit Ethernet
Fast Ethernet
3Com 9300
HiPPI
HiPPI
Gigabit Ethernet
On/Offline PC Farm
62NA45 Data Recording
Sub detector VME crates
NA48
SCI
Event Builder On Line PC Farm
Fast Ethernet
3Com 3900
PCSF
7 KM
Gigabit Ethernet
2 SUN E450 500 GB Disk space
Fast Ethernet
3Com 9300
HiPPI
HiPPI
3Com 3900
Gigabit Ethernet
63NA45 Raw Data recording in Objectivity
- October 98 November 98
- Estimated bandwidth 15 MB/s
- Processes translate Raw Data format to
Objectivity - Database files (1.5 GB) are closed, then written
on tape - Steering done using a set of perl scripts on the
disk servers - On line filtering/reconstruction/calibration
possible - Farm is running Windows NT
- Reconstruction can use PCSF
64Current Future Data rates at CERN
65Summary
- On line PC farms are being used to record data at
sensible rates (Linux) - Off line PC farms are being used for
reconstruction/filtering/analysis (Linux/NT) - Still a lot to do on scalable farm management,
global steering, CDR monitoring, etc..
66PC Total Cost of Ownership
- Software not included
- Install labor not included
- Assumes 3 years lifetime
67DEC 8400 (12-Way) Cost of Ownership
- Software SW maintenance not included
- Assumes 5 years lifetime
68General Conclusions (1)
- PCs are now used for online, quasi online and
offline environments - The offline is now part of the online
- The I/O is still done using RISC/Unix but recent
MP Xeon may change this
69General Conclusions (2)
- PC technology is moving very fast
- Good for performance
- Not so for stability, interoperability
- Not so for understanding issues
- The general management of large farms is not
solved but - Number of initiatives/standards/tools may help us
here WfM, DMI, PXE, ACPI, SMS, TNG, etc.
70General Conclusions (3)
- Linux vs. NT the battle is over
- Choose the one suitable to your application
- NT can be used
- Linux is usable (and offers more performance).
- PC real costs are usually not well understood