The flight of the Condor a decade of High Throughput Computing - PowerPoint PPT Presentation

About This Presentation
Title:

The flight of the Condor a decade of High Throughput Computing

Description:

... jobs that are submitted/started, suspended, resumed and cancelled by the master. ... Checkpointing - enables preemptive resume scheduling (go ahead and use it as ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 56
Provided by: miro99
Category:

less

Transcript and Presenter's Notes

Title: The flight of the Condor a decade of High Throughput Computing


1
The flight of the Condor - a decade of High
Throughput Computing
  • Miron Livny
  • Computer Sciences Department
  • University of Wisconsin-Madison
  • miron_at_cs.wisc.edu

2
Remember!
  • There are no silver bullets.
  • Response time Queuing Time Execution Time.
  • If you believe in parallel computing you need a
    very good reason for not using an idle resource.
  • Debugging complex parallel applications is not
    fun.

3
Background andmotivation
4
  • Since the early days of mankind the primary
    motivation for the establishment of communities
    has been the idea that by being part of an
    organized group the capabilities of an individual
    are improved. The great progress in the area of
    inter-computer communication led to the
    development of means by which stand-alone
    processing sub-systems can be integrated into
    multi-computer communities.

M. Livny, Study of Load Balancing Algorithms
for Decentralized Distributed Processing
Systems., Ph.D thesis, July 1983.
5
The growing gap between what we ownand what
each of us can access
6
Distributed Ownership
  • Due to dramatic decrease in the cost-performance
    ratio of hardware, powerful computing resources
    are owned today by individuals, groups,
    departments, universities
  • Huge increase in the computing capacity owned by
    the scientific community
  • Moderate increase in the computing capacity
    accessible by a scientist

7
What kind of Computing?
?
? High Performance Computing ? Other
8
How aboutHigh Throughput Computing (HTC)?
  • I introduced the term HTC in a seminar at the
    NASA Goddard Flight Center in July of 96 and a
    month later at the European Laboratory for
    Particle Physics (CERN).
  • HTC paper in HPCU News 1(2), June 97.
  • HTC interview in HPCWire, July 97.
  • HTC part of NCSA PACI proposal Sept. 97
  • HTC chapter in the Grid book, July 98.

9
High Throughput Computingis a24-7-365activity
FLOPY ? (606024752)FLOPS
10
A simple scenario of a High Throughput Computing
(HTC) user with a very simple application and
one workstation on his/her desk
11
The HTC Application
  • Study the behavior of F(x,y,z) for 20 values of
    x, 10 values of y and 3 values of z (20103
    600)
  • F takes on the average 3 hours to compute on a
    typical workstation (total 1800 hours)
  • F requires a moderate (128MB) amount of memory
  • F performs little I/O - (x,y,z) is 15 MB and
    F(x,y,z) is 40 MB

12
What we have hereis aMaster Worker Application!
13
Master-Worker Paradigm
  • Many scientific, engineering and commercial
    applications (Software builds and testing,
    sensitivity analysis, parameter space
    exploration, image and movie rendering, High
    Energy Physics event reconstruction, processing
    of optical DNA sequencing, training of
    neural-networks, stochastic optimization, Monte
    Carlo...) follow the Master-Worker (MW) paradigm
    where ...

14
Master-Worker Paradigm
  • a heap or a Directed Acyclic Graph (DAG) of
    tasks is assigned to a master. The master looks
    for workers who can perform tasks that are ready
    to go and passes them a description (input) of
    the task. Upon the completion of a task, the
    worker passes the result (output) of the task
    back to the master.
  • Master may execute some of the tasks.
  • Master maybe a worker of another master.
  • Worker may require initialization data.

15
Master-Worker computing is Naturally Parallel.It
is by no means Embarrassingly Parallel. As you
will see, doing it right is by no means
trivial.Here are a few challenges ...
16
Dynamic or Static?
  • This is the key question one faces when building
    a MW application. How this question is answered
    has an impact on
  • The algorithm
  • Target architecture
  • Resources availability
  • Quality of results
  • Complexity of implementation

17
How do the Master and Worker Communicate?
  • Via a shared/distributed file/disk system using
    reads and writes or
  • Via a message passing system (PVM-MPI) using
    sends and receives or
  • Via a shared memory using loads, stores and
    semaphores.

18
How many workers?
  • One per task?
  • One per CPU allocated to the master?
  • N(t) depending on the dynamic properties of the
    ready to go set of tasks?

19
Job Parallel MW
  • Master and workers communicate via the file
    system.
  • Workers are independent jobs that are
    submitted/started, suspended, resumed and
    cancelled by the master.
  • Master may monitor progress of jobs and
    availability of resources or just collect results
    at the end.

20
Building a basic Job Parallel Application
  • 1. Create n directories.
  • 2. Write an input file in each directory.
  • 3. Submit a cluster of n job.
  • 4. Wait for the cluster to finish.
  • 5. Read an output file from each directory.

21
Task Parallel MW
  • Master and workers exchange data via messages
    delivered by a message passing system like PVM or
    MPI.
  • Master monitors availability of resources and
    expends or shrinks the resource pool of the
    application accordingly.
  • Master monitors the health of workers and
    redistribute tasks accordingly.

22
Our Answer to High Throughput MW Computing
23
  • Modern processing environments that consist
    of large collections of workstations
    interconnected by high capacity network raise the
    following challenging question can we satisfy
    the needs of users who need extra capacity
    without lowering the the quality of service
    experienced by the owners of under utilized
    workstations? The Condor scheduling system is
    our answer to this question.

M. Litzkow, M. Livny and M. Mutka, Condor - A
Hunter of Idle Workstations, IEEE 8th ICDCS,
June 1988.
24
The Condor System
  • A High Throughput Computing system that
    supports large dynamic MW applications on large
    collections of distributively owned resources
    developed, maintained and supported by the Condor
    Team at the University of Wisconsin - Madison
    since 86.
  • Originally developed for UNIX workstations.
  • Fully integrated NT version in advance testing.
  • Deployed world-wide by academia and industry.
  • A 600 CPU system at U of Wisconsin
  • Available at www.cs.wisc.edu/condor.

25
Selected sites (18 Nov 1998 102113)
  • Name Machine
    Running IdleJobs HostsTotal
  • RNI core.rni.helsinki.fi
    9 9 17
  • dali.physik.uni-l dali.physik.uni-leipzig.de
    1 0 23
  • Purdue ECE drum.ecn.purdue.edu
    4 9 4
  • ICG TU-Graz fcggsg06.icg.tu-graz.ac.at
    0 0 47
  • TU-Graz Physikstu fubphpc.tu-graz.ac.at
    0 8 5
  • PCs lam.ap.polyu.edu.hk
    7 5 8
  • C.O.R.E. Digital latke.coredp.com
    7 45 26
  • legba legba.unsl.edu.ar
    0 0 5
  • ictp-test mlab-42.ictp.trieste.it
    18 0 26
  • CGSB-NLS nls7.nlm.nih.gov
    4 1 8
  • UCB-NOW now.cs.berkeley.edu
    3 3 5
  • INFN - Italy venus.cnaf.infn.it
    31 61 84
  • NAS CONDOR POOL win316.nas.nasa.gov
    6 0 20

26
  • Several principals have driven the design of
    Condor. First is that workstation owners should
    always have the resources of the workstation they
    own at their disposal. The second principal is
    that access to remote capacity must be easy, and
    should approximate the local execution
    environment as closely as possible. Portability
    is the third principal behind the design of
    Condor.

M. Litzkow and M. Livny, Experience With the
Condor Distributed Batch System, IEEE Workshop
on Experimental Distributed Systems, Huntsville,
AL. Oct. 1990.
27
Key Condor Mechanisms
  • Matchmaking - enables requests for services and
    offers to provide services find each other
    (ClassAds).
  • Checkpointing - enables preemptive resume
    scheduling (go ahead and use it as long as it is
    available!).
  • Remote I/O - enables remote (from execution site)
    access to local (at submission site) data.
  • Asynchronous API - enables management of dynamic
    (opportunistic) resources.

28
Condor Layers
29
Condor MW services
  • Checkpointing of Job Parallel (JP) workers
  • Remote I/O for master-worker communication
  • Log files for JP workers
  • Management of large (10K) numbers of jobs
  • Process management for dynamic PVM applications
  • A DAGMan (Directed Acyclic Graph Manager)
  • Access to large amounts of computing power

30
Condor System Structure
Central Manager
Collector
Negotiator
C
N
Submit Machine
Execution Machine
CA
...A
RA
...C
...B
Customer Agent
Resource Agent
31
Advertising Protocol
...N
...M
C
N
...M
CA
...A
RA
...C
...B
32
Advertising Protocol
...N
...M
C
N
CA
...A
RA
...C
...B
33
Matching Protocol
...N
C
N
...M
...B
CA
...A
RA
...C
34
Claiming Protocol
...S
C
N
CA
...A
RA
...C
35
Remote Execution
Customer File System
Remote Workstation
Executable
Checkpoint
Network
Input Files
Output Files
May be distributed.
36
Execution
Submission
Owner Agent
Customer Agent
Request Queue
Object Files
Object Files
Execution Agent
Data Object Files
Ckpt Files
Application Process
Application Process
Remote I/O Ckpt
37
Workstation Cluster Workshop December 1992
38
We have users that ...
  • have job parallel MW applications with more
    than 5000 jobs.
  • have task parallel MW applications with more
    than 100 tasks.
  • run their job parallel MW application for more
    than six month.
  • run their task parallel MW application for more
    than four weeks.

39
A Condor Job-Parallel Submit File
  • executable worker
  • requirement ( (OS Linux2.2) Memory gt
    64))
  • initialdir worker_dir.(process)
  • input in
  • output out
  • error err
  • log log
  • queue 1000

40
Material Sciences MW Application
  • Potential start
  • FOR cycle 1 to 36
  • FOR location 1 to 31
  • totalEnergy Energy(location,potential)
  • END
  • potential F(totalEnergy)
  • END

Implemented as a PVM application with the Condor
MW services. Two traces (execution and
performance) visualized by DEVise.
41
(No Transcript)
42
back to the user withthe 600 jobs and only
one workstation to run them
43
First step - get organized!
  • Turn your workstation into a single node
    Personal Condor pool
  • Write a script that creates 600 input files for
    each of the (x,y,z) combinations
  • Submit a cluster of 600 jobs to your personal
    Condor pool
  • Write a script that monitors the logs and
    collects the data from the 600 output files
  • Go on a long vacation (2.5 months)

44
Your Personal Condor will ...
  • ... keep an eye on your jobs and will keep you
    posted on their progress
  • ... implement your policy on when the jobs can
    run on your workstation
  • ... implement your policy on the execution order
    of the jobs
  • .. add fault tolerance to your jobs
  • keep a log of your job activities

45
(No Transcript)
46
and what about theunderutilized workstation
in the next office or the one in the class room
downstairs or the Linux cluster node in the other
building or the O2K node at the other side of
town or
47
(No Transcript)
48
Second step - become a scavenger
  • Install Condor on the machine next door.
  • Install Condor on the machines in the class room.
  • Configure these machines to be part of your
    Condor pool
  • Go on a shorter vacation ...

49
(No Transcript)
50
Third step - Take advantage of your friends
  • Get permission from friendly Condor pools to
    access their resources
  • Configure your personal Condor to flock to
    these pools
  • reconsider your vacation plans ...

51
(No Transcript)
52
(No Transcript)
53
Forth Step - Think big!
  • Get access (account(s) certificate(s)) to a
    Globus managed Grid
  • Submit 599 To Globus Condor glide-in jobs to
    your personal Condor
  • When all your jobs are done, remove any pending
    glide-in jobs
  • Take the rest of the afternoon off ...

54
(No Transcript)
55
Simple is not only beautiful it can be very
effective
Write a Comment
User Comments (0)
About PowerShow.com