Title: Job Delegation and Planning in CondorG ISGC 2005 Taipei, Taiwan
1Job Delegation and Planning in Condor-GISGC
2005 Taipei, Taiwan
2The Condor Project (Established 85)
- Distributed High Throughput Computing research
performed by a team of 35 faculty, full time
staff and students.
3The Condor Project (Established 85)
- Distributed High Throughput Computing research
performed by a team of 35 faculty, full time
staff and students who - face software engineering challenges in a
distributed UNIX/Linux/NT environment - are involved in national and international grid
collaborations, - actively interact with academic and commercial
users, - maintain and support large distributed
production environments, - and educate and train students.
- Funding US Govt. (DoD, DoE, NASA, NSF, NIH),
- ATT, IBM, INTEL, Microsoft, UW-Madison,
4A Multifaceted Project
- Harnessing the power of clusters dedicated
and/or opportunistic (Condor) - Job management services for Grid applications
(Condor-G, Stork) - Fabric management services for Grid resources
(Condor, GlideIns, NeST) - Distributed I/O technology (Parrot, Kangaroo,
NeST) - Job-flow management (DAGMan, Condor, Hawk)
- Distributed monitoring and management (HawkEye)
- Technology for Distributed Systems (ClassAD, MW)
- Packaging and Integration (NMI, VDT)
5Some software produced by the Condor Project
- Condor System
- ClassAd Library
- DAGMan
- Fault Tolerant Shell (FTSH)
- Hawkeye
- GCB
- MW
- NeST
- Stork
- Parrot
- VDT
- And others all as open source
Data!
6Who uses Condor?
- Commercial
- Oracle, Micron, Hartford Life Insurance, CORE,
Xerox, Exxon/Mobile, Shell, Alterra, Texas
Instruments, - Research Community
- Universities, Govt Labs
- Bundles NMI, VDT
- Grid Communities EGEE/LCG/gLite, Particle
Physics Data Grid (PPDG), USCMS, LIGO, iVDGL, NSF
Middleware Initiative GRIDS Center,
7Condor Pool
MatchMaker
Startd
Schedd
Startd
Jobs
Jobs
Startd
Schedd
Jobs
Jobs
8Condor Pool
MatchMaker
Startd
Schedd
Startd
Jobs
Jobs
Jobs
Startd
Jobs
Schedd
Jobs
Jobs
9Condor-G
Schedd
- Condor-C
LSF PBS
Globus 2 Globus 4 Unicore (Nordugrid)
- Condor-G
Schedd
Startd
Jobs
Jobs
10User/Application/Portal
Grid
Fabric (processing, storage, communication)
11Job Delegation
- Transfer of responsibility to schedule and
execute a job - Stage in executable and data files
- Transfer policy instructions
- Securely transfer (and refresh?) credentials,
obtain local identities - Monitor and present job progress (tranparency!)
- Return results
- Multiple delegations can be combined in
interesting ways
12Simple Job Delegation in Condor-G
Globus GRAM
Batch System Front-end
Execute Machine
Condor-G
13Expanding the Model
- What can we do with new forms of job delegation?
- Some ideas
- Mirroring
- Load-balancing
- Glide-in schedd, startd
- Multi-hop grid scheduling
14Mirroring
- What it does
- Jobs mirrored on two Condor-Gs
- If primary Condor-G crashes, secondary one starts
running jobs - On recovery, primary Condor-G gets job status
from secondary one - Removes Condor-G submit point as single point of
failure
15Mirroring Example
Condor-G 1
Condor-G 2
X
Jobs
Jobs
Execute Machine
16Mirroring Example
Condor-G 1
Condor-G 2
Jobs
Execute Machine
17Load-Balancing
- What it does
- Front-end Condor-G distributes all jobs among
several back-end Condor-Gs - Front-end Condor-G keeps updated job status
- Improves scalability
- Maintains single submit point for users
18Load-Balancing Example
Condor-G Back-end 1
Condor-G Front-end
Condor-G Back-end 3
Condor-G Back-end 2
19Glide-In
- Schedd and Startd are separate services that do
not require any special privledges - Thus we can submit them as jobs!
- Glide-In Schedd
- What it does
- Drop a Condor-G onto the front-end machine of a
remote cluster - Delegate jobs to the cluster through the glide-in
schedd - Can apply cluster-specific policies to jobs
- Not fork-and-forget
- Send a manager to the site, instead of manage
across the internet
20Glide-In Schedd Example
Frontend
Middleware
Jobs
Condor-G
Jobs
Batch System
21Glide-In Startd Example
Frontend
Middleware
Batch System
Condor-G (Schedd)
Startd
Job
22Glide-In Startd
- Why?
- Restores all the benefits that may have been
washed away by the middleware - End-to-end management solution
- Preserves job semantic guarantees
- Preserves policy
- Enables lazy planning
23Sample Job Submit file
- universe grid
- grid_type gt2
- globusscheduler cluster1.cs.wisc.edu/jobmanager-
lsf - executable find_particle
- arguments .
- output .
- log
But we want metascheduling
24Represent grid clusters as ClassAds
- ClassAds
- are a set of uniquely named expressions each
expression is called an attribute and is an
attribute name/value pair - combine query and data
- extensible
- semi-structured no fixed schema (flexibility in
an environment consisting of distributed
administrative domains) - Designed for MatchMaking
25- Example of a ClassAd that could represent a
compute cluster in a grid - Type "GridSite"
- Name "FermiComputeCluster"
- Arch Intel-Linux
- Gatekeeper_url "globus.fnal.gov/lsf"
- Load
- QueuedJobs 42
- RunningJobs 200
-
- Requirements ( other.Type "Job"
- Load.QueuedJobs lt 100 )
- GoodPeople "howard", "harry"
- Rank member(other.Owner,
- GoodPeople) 500
26Another Sample - Job Submit
- universe grid
- grid_type gt2owner howard
- executable find_particle.(Arch)
- requirements other.Arch Intel-Linux
other.Arch Sparc-Solaris - rank 0 other.Load.QueuedJobs
- globusscheduler (gatekeeper_url)
-
Note We introduced augmentation of the job
ClassAd based upon information discovered in its
matching resource ClassAd.
27Multi-Hop Grid Scheduling
- Match a job to a Virtual Organization (VO), then
to a resource within that VO - Easier to schedule jobs across multiple VOs and
grids
28Multi-Hop Grid Scheduling Example
Experiment Resource Broker
VO Resource Broker
Experiment Condor-G
VO Condor-G
HEP
CMS
Globus GRAM
Batch Scheduler
29Endless Possibilities
- These new models can be combined with each other
or with other new models - Resulting system can be arbitrarily sophisticated
30Job Delegation Challenges
- New complexity introduces new issues and
exacerbates existing ones - A few
- Transparency
- Representation
- Scheduling Control
- Active Job Control
- Revocation
- Error Handling and Debugging
31Transparency
- Full information about job should be available to
user - Information from full delegation path
- No manual tracing across multiple machines
- Users need to know whats happening with their
jobs
32Representation
- Job state is a vector
- How best to show this to user
- Summary
- Current delegation endpoint
- Job state at endpoint
- Full information available if desired
- Series of nested ClassAds?
33Scheduling Control
- Avoid loops in delegation path
- Give user control of scheduling
- Allow limiting of delegation path length?
- Allow user to specify part or all of delegation
path
34Active Job Control
- User may request certain actions
- hold, suspend, vacate, checkpoint
- Actions cannot be completed synchronously for
user - Must forward along delegation path
- User checks completion later
35Active Job Control (cont)
- Endpoint systems may not support actions
- If possible, execute them at furthest point that
does support them - Allow user to apply action in middle of
delegation path
36Revocation
- Leases
- Lease must be renewed periodically for delegation
to remain valid - Allows revocation during long-term failures
- What are good values for lease lifetime and
update interval?
37Error Handling and Debugging
- Many more places for things to go horribly wrong
- Need clear, simple error semantics
- Logs, logs, logs
- Have them everywhere
38From earlier
- Transfer of responsibility to schedule and
execute a job - Transfer policy instructions
- Stage in executable and data files
- Securely transfer (and refresh?) credentials,
obtain local identities - Monitor and present job progress (tranparency!)
- Return results
39Job Failure Policy Expressions
- Condor/Condor-G augemented so users can supply
job failure policy expressions in the submit
file. - Can be used to describe a successful run, or what
to do in the face of failure. - on_exit_remove ltexpressiongt
- on_exit_hold ltexpressiongt
- periodic_remove ltexpressiongt
- periodic_hold ltexpressiongt
40Job Failure Policy Examples
- Do not remove from queue (i.e. reschedule) if
exits with a signal - on_exit_remove ExitBySignal False
- Place on hold if exits with nonzero status or ran
for less than an hour - on_exit_hold ((ExitBySignalFalse)
(ExitSignal ! 0)) ((ServerStartTime
JobStartDate) lt 3600) - Place on hold if job has spent more than 50 of
its time suspended - periodic_hold CumulativeSuspensionTime gt
(RemoteWallClockTime / 2.0)
41Data Placement (DaP) must be an integral part
ofthe end-to-end solution
Space management and Data transfer
42Stork
- A scheduler for data placement activities in the
Grid - What Condor is for computational jobs, Stork is
for data placement - Stork comes with a new concept
- Make data placement a first class citizen in the
Grid.
43Data Placement Jobs
Computational Jobs
44DAG with DaP
DAG specification
C
45Why Stork?
- Stork understands the characteristics and
semantics of data placement jobs. - Can make smart scheduling decisions, for reliable
and efficient data placement.
46Failure Recovery and Efficient Resource
Utilization
- Fault tolerance
- Just submit a bunch of data placement jobs, and
then go away.. - Control number of concurrent transfers from/to
any storage system - Prevents overloading
- Space allocation and De-allocations
- Make sure space is available
47Support for Heterogeneity
Protocol translation using Stork memory buffer.
48Support for Heterogeneity
Protocol translation using Stork Disk Cache.
49Flexible Job Representation and Multilevel Policy
Support
-
- Type Transfer
- Src_Url srb//ghidorac.sdsc.edu/kosart.cond
or/x.dat - Dest_Url nest//turkey.cs.wisc.edu/kosart/x
.dat -
-
- Max_Retry 10
- Restart_in 2 hours
-
-
50Run-time Adaptation
- Dynamic protocol selection
-
- dap_type transfer
- src_url drouter//slic04.sdsc.edu/tmp/tes
t.dat - dest_url drouter//quest2.ncsa.uiuc.edu/tmp
/test.dat - alt_protocols nest-nest, gsiftp-gsiftp
-
-
- dap_type transfer
- src_url any//slic04.sdsc.edu/tmp/test.da
t - dest_url any//quest2.ncsa.uiuc.edu/tmp/tes
t.dat -
51Run-time Adaptation
- Run-time Protocol Auto-tuning
-
- link slic04.sdsc.edu quest2.ncsa.uiuc.edu
- protocol gsiftp
- bs 1024KB //block size
- tcp_bs 1024KB //TCP buffer size
- p 4
52Planner
DAGMan
Condor-G
Stork
GRAM
StartD
Parrot
Application
RFT
GridFTP
53Thank You!