Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC

Description:

A Cluster is a collection of interconnected whole computers used as a single, ... OSC installs 'Trout' system, dual purpose workstation cluster, 14 SGI O2 ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 21
Provided by: JimGiu6
Learn more at: http://cecs.wright.edu
Category:

less

Transcript and Presenter's Notes

Title: Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC


1
Cluster Computing OverviewSummer Institute for
Advanced ComputingAugust 22, 2000Doug Johnson,
OSC
2
Overview
  • What is Cluster Computing
  • Why Cluster Computing
  • How Clusters Fit with OSC Mission
  • When Did It All Start
  • OSC 128 Processor SGI/Linux Cluster
  • Clusters for Production HPC Environments

3
What is Cluster Computing?
Common Resources CPU(s)MemoryHard DriveNetwork
Card
  • A Cluster is a collection of interconnected whole
    computers used as a single, unified computer
  • Cluster Computing is many things...
  • High performance computing
  • Run programs with parallel algorithms
  • High throughput computing
  • Parametric studies (same program run many times
    with different parameters)
  • High availability computing
  • Fail-over redundancy
  • Both scientific and commercial applications!

NETWORK
4
Brief History of Cluster Computing at OSC
OSC SGI/Linux 128 Processor Cluster, Pentium III
Xeon 550MHz processors, 66 Gbyte RAM, Myrinet and
100Mbit Ethernet interconnect
OSC 10 processor IA32 Linux Cluster, Pentium
II-400MHz processors,Myrinet interconnect, 4.5
Gbyte RAM
OSC installs Trout system, dual purpose
workstation cluster, 14 SGI O2 workstations,
R10000 processors _at_ 150 MHz, ATM interconnect
OSC installs Beaker systems, a dual purpose
workstation cluster - 12 DEC Alpha EV4 processors
with Full Duplex FDDI interconnect
Beowulf project at Center of Excellence in Space
Data and Information Sciences (CESDIS) installs
1st cluster - 16 Intel 486 DX4 processors _at_
100MHz, 16 Mbytes RAM per processor, 10 Mbit
Ethernet interconnect (3per node)
5
Why Parallel Computing
OSC Mission Statement OSC provides a reliable
high performance computing and communications
infrastructure for a diverse, statewide/regional
community including education, academic research,
industry, and state government.
  • Parallel computing is a strong presence at the
    National level and is the future of High
    Performance Computing(HPC)
  • Parallel computing platforms are a vital element
    in our infrastructure
  • Parallel systems have traditionally not been an
    accessible resource, compared to single processor
    systems
  • Higher cost (due mostly to high performance
    interconnect)
  • Less refined user interface
  • Non-traditional programming techniques with
    little training available

6
Why Cluster Computing
OSC Mission Statement ... In collaboration with
this community, OSC evaluates, implements, and
supports new and emerging information
technologies. ...
  • OSC evaluates new and emerging information
    technologies
  • Cluster computing is one of the hottest fields in
    high performance computing
  • Potential benefits of clusters over traditional
    parallel systems
  • High performance interconnect technology is
    approaching commodity availability
  • Performance of commodity systems are increasing
    at an aggressive rate due to the commercial
    market of home/office workstations

7
Why Cluster Computing
  • Potential benefits of clusters over traditional
    parallel systems (cont)
  • Operating system gives users the same environment
    on their desk that they have on the parallel
    system
  • Other differences
  • System administration implications
  • No single system image - OS and software upgrades
    must be applied to all nodes
  • Cluster design lends itself to more frequent
    hardware upgrades
  • Performance implications
  • Accounting/funding implications

8
How Clusters Fit With OSC Mission
OSC Mission Statement ... In collaboration with
this community, OSC evaluates, implements, and
supports new and emerging information
technologies. ...
  • OSC evaluates new and emerging information
    technologies
  • Multiple software packages have been evaluated to
    provide the most robust system
  • Four different network interconnects have been
    installed to evaluate performance
  • Three different processors and operating systems
    were investigated
  • OSC implements new and emerging information
    technologies
  • A cluster under OSC administration has been
    available to users since March, 1999
  • OSC Partnered with Portland Group to bring
    Cluster Development Kit to OSC users
  • OSC supports new and emerging information
    technologies
  • OSC 128 processor cluster in production status
  • Training classes on how to build and use a
    cluster
  • Staff available to Ohio faculty to help answer
    questions and trouble shoot problems

9
To Summarize
  • OSC wants to encourage parallel programming
  • Parallel programming is the future of high
    performance computing
  • Clusters provide increased access to parallel
    systems
  • Develop cluster technology so that it can be
    rolled out to university research labs
  • Provide a hardware and software configuration
    that will allow labs to construct a working
    cluster with minimal effort
  • Experienced OSC staff can provide technical
    assistance
  • Evaluate software and hardware configurations to
    assist researchers in defining a system that will
    best suit their needs
  • Let the researchers focus on science
  • Based on user applications, provide performance
    analysis showing the optimal hardware and
    software configuration

10
When Did It All Start?
April, 1999Performance evaluation yields
promising results and machine is opened to users
in April, 1999
  • December, 1998OSC management authorizes a
    dedicated 10 processor cluster for technology
    evaluation

1 - Front end node 2 Intel Pentium II 400MHz
processors 512 Mbyte RAM, 18 Gbyte Disk 4 -
Compute nodes 2 Intel Pentium II 400MHz
processors 1 Gbyte RAM, 9 Gbyte
disk Interconnects 100mbit Ethernet,
Dolphinics SCI, Myricom Myrinet Linux OS, PBS
Batch System, PGI Compiler Suite
11
OSC/SGI Cluster
September, 1999Agreement signed between OSC and
SGI October, 1999System powered on November,
1999Machine configured and running applications
on floor of Supercomputing 99 December,
1999Machine installed at OSC February,
2000Machine opened to friendly users
12
Hardware
All nodes are SGI 1400L servers
  • 1 front-end node configured with
  • Two Gigabytes of RAM
  • Four 550 MHz Intel Pentium III Xeon processors,
    each with 512kB of secondary cache
  • 48 Gigabytes, ultra-wide SCSI hard drives
  • Two 100Base-T Ethernet interfaces
  • One HIPPI interface
  • 32 compute nodes each configured with
  • Two Gigabytes of RAM
  • Four 550 MHz Intel Pentium III Xeon processors,
    each with 512kB of secondary cache
  • 18 Gigabytes, ultra-wide SCSI hard drives
  • Two Myrinet interfaces
  • One 100Base-T Ethernet interface

13
Software and Configuration
  • Hardware originally assembled in Mountainview, CA
    by SGI Professional Services
  • OS and software environment installed and
    configured by OSC staff
  • Linux operating system
  • Portable Batch System (PBS)
  • Portland Group Compiler Suite
  • Myrinet MPICH-GM interface

14
Clusters for Production HPC Environment
  • There are two significant efforts with building
    clusters
  • Building a cluster and making it operational
  • Making the cluster a production system
  • Ability to host multiple users simultaneously
  • Ability to schedule system resources
  • Ability to function without constant intervention
  • The OSC cluster has the following attributes that
    make it a true HPC production system
  • Connection to a Mass Storage System (MSS)
  • Integrated into OSC account database system
  • Job accounting
  • Good utilization
  • High availability

15
Mass Storage Support
HIPPI
DMF
100 Mbit Switch
100 Mbit (private)
Origin 2000 1 Terabyte disk storageData
Migration Facility (DMF)
IBM 3494 30 Terabyte tape storage
. . . .
16
User Accounts and Accounting
  • User Accounts
  • Cluster is integrated into the Centers database
    system for automatic account generation and
    maintenance
  • Job Accounting
  • Accounting has been configured into the
    environment which tracks CPU usage of users
  • CPU usage is converted with a charging algorithm
    and deducted from a Principal Investigators
    account
  • Users can view accounting history with text
    command from Linux command prompt

17
Utilization and Availability
  • Utilization
  • System utilization is recorded and accessible via
    a web link
  • For parallel systems, utilization is expected to
    be around 50 to 70
  • Current utilization is about 70 parallel and 30
    serial
  • Availability
  • Good availability has been achieved through
    significant uptime and minimal system problems
  • Scheduling downtime every 4 weeks for software
    upgrades, hardware modifications and general
    system maintenance

18
TCP Stream Performance
19
TCP Stream Performance
20
UDP Stream Performance
  • ./netperf -l 60 -H fe.ovl.osc.edu -i 10,2 -I
    99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768
  • UDP UNIDIRECTIONAL SEND TEST to fe.ovl.osc.edu
    /-5.0 _at_ 99 conf.
  • Socket Message Elapsed Messages
  • Size Size Time Okay Errors
    Throughput
  • bytes bytes secs
    106bits/sec
  • 131070 1472 59.99 3229909 0
    634.03
  • 524288 59.99 2169706
    425.91
Write a Comment
User Comments (0)
About PowerShow.com