Title: The flight of the Condor a decade of High Throughput Computing
1The flight of the Condor - a decade of High
Throughput Computing
- Miron Livny
- Computer Sciences Department
- University of Wisconsin-Madison
- miron_at_cs.wisc.edu
2Remember!
- There are no silver bullets.
- Response time Queuing Time Execution Time.
- If you believe in parallel computing you need a
very good reason for not using an idle resource. - Debugging complex parallel applications is not
fun.
3Background andmotivation
4- Since the early days of mankind the primary
motivation for the establishment of communities
has been the idea that by being part of an
organized group the capabilities of an individual
are improved. The great progress in the area of
inter-computer communication led to the
development of means by which stand-alone
processing sub-systems can be integrated into
multi-computer communities.
M. Livny, Study of Load Balancing Algorithms
for Decentralized Distributed Processing
Systems., Ph.D thesis, July 1983.
5The growing gap between what we ownand what
each of us can access
6Distributed Ownership
- Due to dramatic decrease in the cost-performance
ratio of hardware, powerful computing resources
are owned today by individuals, groups,
departments, universities - Huge increase in the computing capacity owned by
the scientific community - Moderate increase in the computing capacity
accessible by a scientist
7What kind of Computing?
?
? High Performance Computing ? Other
8How aboutHigh Throughput Computing (HTC)?
- I introduced the term HTC in a seminar at the
NASA Goddard Flight Center in July of 96 and a
month later at the European Laboratory for
Particle Physics (CERN). - HTC paper in HPCU News 1(2), June 97.
- HTC interview in HPCWire, July 97.
- HTC part of NCSA PACI proposal Sept. 97
- HTC chapter in the Grid book, July 98.
9High Throughput Computingis a24-7-365activity
FLOPY ? (606024752)FLOPS
10A simple scenario of a High Throughput Computing
(HTC) user with a very simple application and
one workstation on his/her desk
11The HTC Application
- Study the behavior of F(x,y,z) for 20 values of
x, 10 values of y and 3 values of z (20103
600) - F takes on the average 3 hours to compute on a
typical workstation (total 1800 hours) - F requires a moderate (128MB) amount of memory
- F performs little I/O - (x,y,z) is 15 MB and
F(x,y,z) is 40 MB
12What we have hereis aMaster Worker Application!
13Master-Worker Paradigm
- Many scientific, engineering and commercial
applications (Software builds and testing,
sensitivity analysis, parameter space
exploration, image and movie rendering, High
Energy Physics event reconstruction, processing
of optical DNA sequencing, training of
neural-networks, stochastic optimization, Monte
Carlo...) follow the Master-Worker (MW) paradigm
where ...
14Master-Worker Paradigm
- a heap or a Directed Acyclic Graph (DAG) of
tasks is assigned to a master. The master looks
for workers who can perform tasks that are ready
to go and passes them a description (input) of
the task. Upon the completion of a task, the
worker passes the result (output) of the task
back to the master. - Master may execute some of the tasks.
- Master maybe a worker of another master.
- Worker may require initialization data.
15Master-Worker computing is Naturally Parallel.It
is by no means Embarrassingly Parallel. As you
will see, doing it right is by no means
trivial.Here are a few challenges ...
16Dynamic or Static?
- This is the key question one faces when building
a MW application. How this question is answered
has an impact on - The algorithm
- Target architecture
- Resources availability
- Quality of results
- Complexity of implementation
17How do the Master and Worker Communicate?
- Via a shared/distributed file/disk system using
reads and writes or - Via a message passing system (PVM-MPI) using
sends and receives or - Via a shared memory using loads, stores and
semaphores.
18How many workers?
- One per task?
- One per CPU allocated to the master?
- N(t) depending on the dynamic properties of the
ready to go set of tasks?
19Job Parallel MW
- Master and workers communicate via the file
system. - Workers are independent jobs that are
submitted/started, suspended, resumed and
cancelled by the master. - Master may monitor progress of jobs and
availability of resources or just collect results
at the end.
20Building a basic Job Parallel Application
- 1. Create n directories.
- 2. Write an input file in each directory.
- 3. Submit a cluster of n job.
- 4. Wait for the cluster to finish.
- 5. Read an output file from each directory.
21Task Parallel MW
- Master and workers exchange data via messages
delivered by a message passing system like PVM or
MPI. - Master monitors availability of resources and
expends or shrinks the resource pool of the
application accordingly. - Master monitors the health of workers and
redistribute tasks accordingly.
22Our Answer to High Throughput MW Computing
23- Modern processing environments that consist
of large collections of workstations
interconnected by high capacity network raise the
following challenging question can we satisfy
the needs of users who need extra capacity
without lowering the the quality of service
experienced by the owners of under utilized
workstations? The Condor scheduling system is
our answer to this question.
M. Litzkow, M. Livny and M. Mutka, Condor - A
Hunter of Idle Workstations, IEEE 8th ICDCS,
June 1988.
24The Condor System
- A High Throughput Computing system that
supports large dynamic MW applications on large
collections of distributively owned resources
developed, maintained and supported by the Condor
Team at the University of Wisconsin - Madison
since 86. - Originally developed for UNIX workstations.
- Fully integrated NT version in advance testing.
- Deployed world-wide by academia and industry.
- A 600 CPU system at U of Wisconsin
- Available at www.cs.wisc.edu/condor.
25 Selected sites (18 Nov 1998 102113)
- Name Machine
Running IdleJobs HostsTotal - RNI core.rni.helsinki.fi
9 9 17 - dali.physik.uni-l dali.physik.uni-leipzig.de
1 0 23 - Purdue ECE drum.ecn.purdue.edu
4 9 4 - ICG TU-Graz fcggsg06.icg.tu-graz.ac.at
0 0 47 - TU-Graz Physikstu fubphpc.tu-graz.ac.at
0 8 5 - PCs lam.ap.polyu.edu.hk
7 5 8 - C.O.R.E. Digital latke.coredp.com
7 45 26 - legba legba.unsl.edu.ar
0 0 5 - ictp-test mlab-42.ictp.trieste.it
18 0 26 - CGSB-NLS nls7.nlm.nih.gov
4 1 8 - UCB-NOW now.cs.berkeley.edu
3 3 5 - INFN - Italy venus.cnaf.infn.it
31 61 84 - NAS CONDOR POOL win316.nas.nasa.gov
6 0 20
26- Several principals have driven the design of
Condor. First is that workstation owners should
always have the resources of the workstation they
own at their disposal. The second principal is
that access to remote capacity must be easy, and
should approximate the local execution
environment as closely as possible. Portability
is the third principal behind the design of
Condor.
M. Litzkow and M. Livny, Experience With the
Condor Distributed Batch System, IEEE Workshop
on Experimental Distributed Systems, Huntsville,
AL. Oct. 1990.
27Key Condor Mechanisms
- Matchmaking - enables requests for services and
offers to provide services find each other
(ClassAds). - Checkpointing - enables preemptive resume
scheduling (go ahead and use it as long as it is
available!). - Remote I/O - enables remote (from execution site)
access to local (at submission site) data. - Asynchronous API - enables management of dynamic
(opportunistic) resources.
28Condor Layers
29Condor MW services
- Checkpointing of Job Parallel (JP) workers
- Remote I/O for master-worker communication
- Log files for JP workers
- Management of large (10K) numbers of jobs
- Process management for dynamic PVM applications
- A DAGMan (Directed Acyclic Graph Manager)
- Access to large amounts of computing power
30Condor System Structure
Central Manager
Collector
Negotiator
C
N
Submit Machine
Execution Machine
CA
...A
RA
...C
...B
Customer Agent
Resource Agent
31Advertising Protocol
...N
...M
C
N
...M
CA
...A
RA
...C
...B
32Advertising Protocol
...N
...M
C
N
CA
...A
RA
...C
...B
33Matching Protocol
...N
C
N
...M
...B
CA
...A
RA
...C
34Claiming Protocol
...S
C
N
CA
...A
RA
...C
35Remote Execution
Customer File System
Remote Workstation
Executable
Checkpoint
Network
Input Files
Output Files
May be distributed.
36Execution
Submission
Owner Agent
Customer Agent
Request Queue
Object Files
Object Files
Execution Agent
Data Object Files
Ckpt Files
Application Process
Application Process
Remote I/O Ckpt
37Workstation Cluster Workshop December 1992
38We have users that ...
- have job parallel MW applications with more
than 5000 jobs. - have task parallel MW applications with more
than 100 tasks. - run their job parallel MW application for more
than six month. - run their task parallel MW application for more
than four weeks.
39A Condor Job-Parallel Submit File
- executable worker
- requirement ( (OS Linux2.2) Memory gt
64)) - initialdir worker_dir.(process)
- input in
- output out
- error err
- log log
- queue 1000
40Material Sciences MW Application
- Potential start
- FOR cycle 1 to 36
- FOR location 1 to 31
- totalEnergy Energy(location,potential)
- END
- potential F(totalEnergy)
- END
Implemented as a PVM application with the Condor
MW services. Two traces (execution and
performance) visualized by DEVise.
41(No Transcript)
42 back to the user withthe 600 jobs and only
one workstation to run them
43First step - get organized!
- Turn your workstation into a single node
Personal Condor pool - Write a script that creates 600 input files for
each of the (x,y,z) combinations - Submit a cluster of 600 jobs to your personal
Condor pool - Write a script that monitors the logs and
collects the data from the 600 output files - Go on a long vacation (2.5 months)
44Your Personal Condor will ...
- ... keep an eye on your jobs and will keep you
posted on their progress - ... implement your policy on when the jobs can
run on your workstation - ... implement your policy on the execution order
of the jobs - .. add fault tolerance to your jobs
- keep a log of your job activities
45(No Transcript)
46 and what about theunderutilized workstation
in the next office or the one in the class room
downstairs or the Linux cluster node in the other
building or the O2K node at the other side of
town or
47(No Transcript)
48Second step - become a scavenger
- Install Condor on the machine next door.
- Install Condor on the machines in the class room.
- Configure these machines to be part of your
Condor pool - Go on a shorter vacation ...
49(No Transcript)
50Third step - Take advantage of your friends
- Get permission from friendly Condor pools to
access their resources - Configure your personal Condor to flock to
these pools - reconsider your vacation plans ...
51(No Transcript)
52(No Transcript)
53Forth Step - Think big!
- Get access (account(s) certificate(s)) to a
Globus managed Grid - Submit 599 To Globus Condor glide-in jobs to
your personal Condor - When all your jobs are done, remove any pending
glide-in jobs - Take the rest of the afternoon off ...
54(No Transcript)
55Simple is not only beautiful it can be very
effective