The IEEE CS Task Force on Cluster Computing (TFCC) - PowerPoint PPT Presentation

About This Presentation
Title:

The IEEE CS Task Force on Cluster Computing (TFCC)

Description:

The IEEE CS Task Force on Cluster Computing (TFCC) William Gropp Mathematics and Computer Science Argonne National Lab www.mcs.anl.gov/~gropp Thanks to Mark Baker – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 33
Provided by: fna87
Category:
Tags: ieee | tfcc | cluster | computing | emil | force | task

less

Transcript and Presenter's Notes

Title: The IEEE CS Task Force on Cluster Computing (TFCC)


1
The IEEE CS Task Force on Cluster Computing (TFCC)
William GroppMathematics and Computer
ScienceArgonne National Labwww.mcs.anl.gov/grop
p
Thanks to Mark BakerUniversity of Portsmouth,
UKhttp//www.dcs.port.ac.uk/mab
2
A Little History
  • In 1998 there was obvious huge interest in
    clusters, so it seemed natural to set up a
    focused group in this area.
  • A Cluster Computing Task Force was proposed to
    the IEEE CS.
  • The TFCC was approved and started operating in
    February 1999 been going just over 2 years.

3
Proposed Activities
  • Act as an international forum to promote cluster
    computing research and education, and participate
    in setting up technical standards in this area.
  • Be involved with issues related to the design,
    analysis and development of cluster systems as
    well as the applications that use them.
  • Sponsor professional meetings, produce
    publications, set guidelines for educational
    programs, and help co-ordinate academic, funding
    agency, and industry activities.
  • Organize events and hold a number of workshops
    that would span the range of activities sponsored
    by the Task Force.
  • Publish a bi-annual newsletter to help the
    community keep abreast of activities in field.

4
IEEE CS Task Forces
  • A TF is expected to have a finite term of
    existence, normally a period of 2-3 years -
    continued existence beyond that point is
    generally not appropriate.
  • A TF is expected to either increase their scope
    of activities such that establishment of a
    Technical Committee (TC) is warranted, or the
    task force will be merged into existing TCs.
  • TFCC will submit an application to the CS become
    a TC later this year.

5
Why a separate TFCC!
  • It brings together all the activities/technologies
    used with Cluster Computing into one area - so
    instead of tracking four or five IEEE TCs there
    is one...
  • Cluster Computing is NOT just Parallel,
    Distributed, OSs, or the Internet, it is a mix of
    them all, and consequently different.
  • The TFCC is an appropriate body for focusing
    activities and publications associated with
    Cluster Computing.

6
http//www.ieeetfcc.org
7
TFCC Mailing Lists
  • Currently three emails lists have been set up
  • tfcc-l_at_bucknell.edu a discussion list open to
    anyone interested in the TFCC - see TFCC page for
    info. on how to subscribe.
  • tfcc-exe_at_port.ac.uk a closed executive
    committee mailing reflector.
  • tfcc-adv_at_port.ac.uk a closed advisory
    committee mailing reflector.

8
Annual Conference ClusterXY
  • 1st IEEE International Workshop on Cluster
    Computing (Cluster 1999), Melbourne, Australia,
    December 1999, about 105 attendees from 16
    countries.
  • http//www.clustercomp.org
  • 2nd IEEE International Conference on Cluster
    Computing (Cluster 2000), Chemnitz, Germany,
    November, 2000, anticipate 160 attendees.
  • http//www.tu-chemnitz.de/cluster2000
  • 3rd IEEE International Conference on Cluster
    Computing (Cluster 2001), Newport Beach,
    California, October 8-11, 2001, expect 250-300
    attendees.
  • http//andy.usc.edu/cluster2001

9
Associated Events - GRIDXY
  • 1st IEEE/ACM International Workshop on Grid
    Computing (Grid2000), Bangalore, India, December
    17, 2000 (attendees from 15 countries).
  • http//www.gridcomputing.org
  • 2nd IEEE/ACM International Workshop on Grid
    Computing (Grid2001), at SC2001, November 2001

10
Supercomputing
  • Birds of A Feather at SC99 and SC2000.
  • Aims of meetings are to gather together
    interested parties and bring them up to date, but
    also put together a bunch of short talks and
    start a discussion on a variety of topics
  • Probably be another at SC01 depending on the
    community interest.

11
Other Activities
  • Book donation program
  • Cluster Computing Archive
  • www.ieeetfcc.org/ClusterArchive.html
  • TopClusters Project
  • www.TopClusters.org
  • TFCC Whitepaper
  • www.dcs.port.ac.uk/mab/tfcc/WhitePaper
  • TFCC Newsletter
  • www.eg.bucknell.edu/hyde/tfcc

12
TopClusters Project
  • http//www.TopClusters.org
  • TFCC collaboration with Top500 project.
  • Numeric, I/O, Web, Database, and Application
    level benchmarking of clusters.
  • Joint BOF with Top500 at SC2000 on Cluster-based
    benchmarking.
  • Ongoing effort

13
TFCC Whitepaper
  • A Whitepaper on Cluster Computing, submitted to
    the International Journal of High-Performance
    Applications and Supercomputing, November 2000
  • Snap-shot of the state-of-the-art of Cluster
    Computing.
  • Preprint, www.dcs.port.ac.uk/mab/tfcc/WhitePaper/

14
TFCC Membership
  • Over 300 registered members
  • Free membership open to all, but few benefits may
    be restricted - (reduced registration fee for
    IEEE members)
  • Over 450 on the TFCC mailing list
    lttfcc-l_at_bucknell.edugt

15
Future Plans
  • We plan to submit an application to the IEEE CS
    Technical Activities Board (TAB) to attain full
    Technical Committee status.
  • The TAB see the TFCC as a success and we hope
    that our application will be successful.
  • Obviously if we achieve TC status, we will need
    the continuing assistance and help of the TFCCs
    current volunteers plus encourage a bunch of new
    ones

16
Summary
  • Successful conference series has been started,
    with commercial sponsorship.
  • Promoting Cluster-based technologies through TFCC
    sponsorship.
  • Helping the community with our book donation
    program.
  • Engendering debate and discussion through mailing
    forum.
  • Keeping the community informed with our
    information rich TFCC Web site.

17
Scalable Clusters
  • TopCluster.org list
  • 26 Clusters with 128 nodes
  • 8 with 500 nodes
  • 34 with 64-127 nodes
  • Most run Linux
  • Most dedicated to applications
  • Where are scalable tools developed and tested?
  • Caveats
  • Does not include MPP-like systems (IBM SP, SGI
    Origin, Compaq, Intel TFLOPs, etc.)
  • Not a complete list
  • Only clusters explicitly contributed to
    topcluster.org

18
What is Scalability?
  • Most common definition in use
  • Works for n1 nodes if it works for n, for small
    n
  • Practical definition
  • Operations complete fast enough
  • 0.5 to 3 seconds for interactive
  • Operations are reliable
  • Approach to scalability must not be fragile

19
Issues in Clusters and Scalability
  • Developing and Testing Tools
  • Requires convenient access to large-scale system
  • Can this co-exist with production computing?
  • Too many different tools
  • Why not adopt Unix philosophy?
  • Example solution Scalable Unix Tools
  • Following slides thanks to Rusty Lusk and Emil Ong

20
What Are the Scalable Unix Tools?
  • Parallel versions of common Unix commands like
    ps, ls, cp, , with appropriate semantics
  • A few new commands in the same spirit but without
    a serial counterpart
  • Designed for users
  • New this spring release of a high-performance
    implementation based on MPI
  • One of the original official Ptools projects
  • Original definition published
  • Proceedings of the Scalable High Performance
    Computing Conference
  • http//www.mcs.anl.gov/gropp/papers/1994/shpcc-pa
    per.ps

21
Motivation
  • Basic Unix commands (ls, grep, find, ) are
    quintessential tools.
  • Simple syntax and semantics (except maybe find
    syntax)
  • Have same component interface (lines of text,
    stdin, stdout)
  • Unix redirection ( lt, gt, and especially ) allow
    tools to be easily combined into powerful command
    lines
  • Old-fashioned no GUI, little interactivity

22
Motivation, continued
  • Many parallel machines have Unix and at least
    partially distinct file systems on each node.
  • A user needs simple and familiar ways to
  • Copy a file to local file space on each node
  • Find all processes running on all nodes
  • Test for conditions on all nodes
  • Avoid getting swamped with output
  • On large machines these commands are not useful
    unless they take advantage of parallelism in
    their execution.

23
Design Goals
  • Familiar to Unix users
  • Similar names (we chose ptltUnix-namegt)
  • Same arguments, similar semantics
  • Interact well with traditional Unix commands,
    facilitating construction of powerful command
    lines
  • Run at interactive speeds (requires scalability
    in parallel process manager startup and handling
    of I/O)

24
Part I Parallel Versions of Traditional Commands
  • ptcp
  • ptmv
  • ptrm
  • ptln
  • ptmkdir
  • ptrmdir
  • ptchmod
  • ptchgrp
  • ptchown
  • pttestao
  • Select nodes to run on by
  • -all
  • -m ltfile of hostnamesgt
  • -M lthostlistgt
  • donner dasher blitzen
  • ccnd_at_1-32,42,65-96

25
Part II Traditional Commands Producing Lots of
Output
  • ptcat, ptls, ptfind
  • Have potential to produce lots of output, and the
    source is also of interest
  • With h option ptls M noded_at_1-3 -h
  • node1
  • myfile1
  • node2
  • node3
  • myfile1
  • myfile2

26
Performance of ptcp
  • Copying a single 10 MB file
  • to 241 nodes in 14 seconds

Time to Copy 10MB file
Total Bandwidth
27
Watching ptcp
  • ptcp all bigfile BIGFILE
  • X1
  • while true do \
  • ptexec -all 'echo "hostname ls -s BIGFILE \
  • awk \ "print \\"percentage\\" \ (1)/98 \\"
    blue \ red\\"\""' ptdisp -h

28
Percentage of Completion
29
Percentage of Completion
30
Availability
  • Open source
  • Get from http//www.mcs.anl.gov/sut
  • All source, man pages
  • Configure, make, on Linux, Solaris, Irix, AIX
  • Needs MPI implementation with mpirun
  • Developed with Linux, MPICH, MPD, on Chiba City
    at Argonne

31
Chiba City Scalability Testbed
  • http//www-unix.mcs.anl.gov/chiba/

32
Some Other Efforts in Scalable Clusters
  • Large Programs
  • DOE Scientific Discovery through Advanced
    Computing (SciDAC)
  • NSF Distributed Terascale Facility (DTF)
  • OSCAR
  • Goal is a cluster in a box CD
  • PVFS (Parallel Virtual File System)
  • Many Smaller Efforts
  • www.beowulf.org, etc.
  • Commercial Efforts
  • Scyld, etc.
Write a Comment
User Comments (0)
About PowerShow.com