Major Systems at ANL - PowerPoint PPT Presentation

About This Presentation
Title:

Major Systems at ANL

Description:

Largely based on SSS component architecture and interfaces ... Simple interfaces allow fast implementation of custom components (resource manager) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: rem46
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: Major Systems at ANL


1
Major Systems at ANL
  • Bill Gropp
  • www.mcs.anl.gov/gropp(virtual Remy Evard)

2
Current User Facilities
  • Chiba City Linux Cluster for Scalabilty
  • OASCR funded. Installed in 1999.
  • 512 CPUs, 256 nodes, Myrinet, 2TB storage.
  • Mission address scalability issues in
    system software,
  • open source software, and applications
    code.
  • Jazz Linux Cluster for ANL Apps
  • ANL funded. Installed in 2002. Achieved 1.1 TF
    sustained.
  • 350 CPUs, Myrinet, 20TB storage.
  • Mission support and enhance ANL
    application community.
  • 50 projects.
  • On the DOE Science Grid.
  • TeraGrid Linux Cluster for NSF Grid Users
  • NSF funded as part of DTF and ETF.
  • 128 IA-64 CPUs for computing.
  • 192 IA-32 CPUs for visualization.

3
Current Testbeds
  • Advanced Architectures Testbed
  • ANL LDRD Funding. Established in 2002.
  • Experimental systems FPGAs,Hierarchical
    Architectures, ...
  • Mission explore programming models and
    hardware
  • architectures for future architectures.
  • Grid and Networking Testbeds
  • I-WIRE Illinois-funded Dark Fiber
  • Participation in large number of Grid projects.
  • Facilities at ANL include DataGrid, Distributed
    Optical Testbed, and others.
  • Mission Grids and networks as an enabling
    technology
  • for Petascale science.
  • Visualization and Collaboration Facilities
  • AccessGrid, ActiveMural, Linux CAVE, others

4
Chiba City - the Argonne Scalable Cluster
1 of 2 rows of Chiba City
256 computing nodes. 512 PIII CPUs. 32
visualization nodes. 8 storage nodes. 4TB of
disk. Myrinet interconnect. Mission
Scalability and open source software testbed.
http//www.mcs.anl.gov/chiba/
5
Systems Software Challenges
  • Scale Invariance
  • Systems services need to scale to arbitrary
    large-scale systems (e.g. I/O, scheduling,
    monitoring, process management, error reporting,
    diagnostics etc.)
  • Self-organizing services provides one path to
    scale invariance
  • Fault Tolerance
  • Systems services need to provide sustained
    performance in spite of hardware failures
  • No-single point of control, peer-to-peer
    redundancy
  • Autonomy
  • Systems services should be self-configuring,
    auto-updating and self-monitoring

6
Testbed Uses
  • System Software
  • MPI Process Management
  • Parallel Filesystems
  • Cluster Distribution Testing
  • Network Research
  • Virtual Node Tests

7
Testbed Software Development
  • Largely based on SSS component architecture and
    interfaces
  • Existing resource management software didnt meet
    needs
  • SSS Component architecture allowed easy
    substitution of system software where required
  • Simple interfaces allow fast implementation of
    custom components (resource manager)
  • Open architecture allows implementation of extra
    component based in local requirements (file
    staging)

8
Chiba City Implementation
Meta Scheduler
Meta Monitor
Meta Manager
Meta Services
Accounting
Scheduler
System Job Monitor
Node State Manager
Service Directory
Node Configuration Build Manager
Communication Library
Event Manager
Allocation Management
Usage Reports
Validation Testing
Process Manager
Job Queue Manager
Hardware Infrastructure Manager
Checkpoint / Restart
9
Software Deployment Testing
  • Beta software run in production
  • Testbed software stack
  • Configuration management tools
  • Global process manager
  • Cluster distribution installation testing
  • Friendly users provide useful feedback during the
    development process

10
The ANL LCRC Computing Clusterhttp//www.lcrc.anl
.gov
350 computing nodes 2.4 GHz Pentium IV 50
w/ 2 GB RAM 50 w/ 1 GB RAM 80 GB local
scratch disk Linux
10 TB global working disk 8 dual 2.4 GHz
Pentium IV servers 10 TB SCSI JBOD disks PVFS
file system
10 TB home disk 8 dual 2.4 GHz Pentium IV
servers 10 TB Fiber Channel disks GFS between
servers NFS to the nodes
Network Myrinet 2000 to all systems Fast
Ethernet to the nodes GigE aggregation
Support 4 front end nodes 2x 2.4 GHz PIV 8
management systems
1Gb to ANL
11
LCRC enables analysis of complex systems
12
LCRC enables studies of system dynamics
13
Jazz Usage Capacity and Load
  • Weve reached the practical capacity limit given
    the job mix.
  • There are always jobs in the queue. Wait time
    varies enormously, averaging 1 hr.

14
Jazz Usage Accounts
  • Constant growth of 15 users a month.

15
Jazz Usage Projects
  • Steady addition of 6 new projects a month.

16
FY2003 LCRC Usage by DomainA wide range of lab
missions
17
Jazz Usage by Domain over time
18
Jazz Usage Large Projects (gt5000 hrs)
PETSc
Startup Projects
Ptools
QMC - PHY
Nanocatalysis - CNM
Sediment - MCS
Protein - NE
Neocortex Sim - MCS
Heights EUV - ET
NumericalReactor - NE
Lattice QCD - HEP
Compnano - CNM
Foam - MCS
COLUMBUSCHM
Climate - MCS
Chaos - MCS
Aerosols - ER
19
ETF Hardware Deployment, Fall 2003http//www.tera
grid.org
100 TB DataWulf
96 GeForce4 Graphics Pipes
96 Pentium4 64 2p Madison Myrinet
32 Pentium4 52 2p Itanium2 20 2p Madison Myrinet
20 TB
Caltech
ANL
256 2p Itanium2 670 2p Madison Myrinet
128 2p Itanium2 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
20
ETF ANL 1.4 TF Madison/Pentium IV, 20 TB, Viz
30 Gbps to TeraGrid Network
Visualization .9 TF Pentium IV 96 nodes
Compute .5 TF Madison 64 nodes
GbE Fabric
2p Madison 4 GB memory 2x73 GB
2p 2.4 GHz 4 GB RAM 73 GB disk Radeon 9000
2p 2.4 GHz 4 GB RAM 73 GB disk Radeon 9000
2p Madison 4 GB memory 2x73 GB
250MB/s/node 64 nodes
250MB/s/node 96 nodes
96 visualization streams
Myrinet Fabric
Viz Devices
Network Viz
Storage I/O over Myrinet and/or GbE
Viz I/O over Myrinet and/or GbE To TG network.
2p 2.4 GHz 4 GB RAM
4 2p PIV Nodes
4 4p Madison Nodes
Login, FTP
8 2x FC
20 TB
Storage Nodes
Interactive Nodes
Write a Comment
User Comments (0)
About PowerShow.com