Title: Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li
1Grid Scheduling and Resource Managementbaseado
no Cap 6 do livro The Grid Core Technologies, M.
Li M. Baker, J, Wiiey, 2005
2- Grid scheduling
- Mapping Grid jobs to resources over multiple
admin. domains - A Grid job can be split in many tasks the
scheduler must - select resources
- schedule tasks
- to meet user/application requirements
- (global exec. time and cost of used resources)
3Scheduling paradigms
- Centralized scheduling
- Distributed scheduling
- Hierarchical scheduling
4Centralized scheduling
- A central node is the resource manager to
schedule jobs to all known nodes - Practical use in computing centres where
resources have similar characteristics and usage
policies - Pending jobs wait in a central job queue until
dispatched by the central scheduler
5(No Transcript)
6- Advantages
- Good scheduling decisions?
- - access to all needed information, up-to-date
about available resources - Disadvantages
- does not scale well when resource pool increases
- scheduler is a bottleneck
- single point of failure
7Distributed scheduling
- No central scheduler.
- Multiple local schedulers cooperate to dispatch
jobs to the nodes - Two approaches
-
- with direct communication among schedulers
- Indirect communication
8- Advantages
- scalability
- better fault-tolerance and reliability
- Disadvantages
- no global scheduler ? may lead to sub-optimal
scheduling decisions
9Direct communication
- Each scheduler has
- a list of remote schedulers with which it
communicates for job dispatching - Or
- there is a central information directory with
information on each scheduler
10(No Transcript)
11- If a job cannot be dispatched via its local job
queue, other schedulers are contacted to find
appropriate resources
12Indirect communication via a job pool
- Jobs that cannot execute immediately locally, are
sent to a central job pool. - Schedulers can select suitable jobs to run on
their resources. - Policies must ensure all jobs eventually get
executed. -
13(No Transcript)
14Hierarchical scheduling
- A central scheduler interacts with local
schedulers for job submission as a kind of
meta-scheduler dispatching submitted jobs to the
local schedulers - Can have scalability, bottleneck, and
fault-tolerance problems - But allows different policies for job scheduling
from the global and local schedulers
15(No Transcript)
16Scheduling operations
- 4 main stages
- Resource discovery
- Resource selection
- Schedule generation
- Job execution
171- Resource discovery
- Goal Identify a list of authenticated resources
available for job submission and execution - Needs to consider dynamic changes by deciding
depending on dynamic state information on the
available resources, and by online revising the
decisions. - Eg like a compilerthat schedules machine
instructions to minimize resource idle time - Need to know
- what resource are accessible
- how busy they are
- how long to communicate with them
- how long to communicate between them
- To decide on more efficient resource allocation.
- Typical models
- pull
- push
- push-pull
18The pull model
- A daemon associated with the scheduler is
responsible for querying Grid resources to get
state information on - CPU loads,
- available memory,
- etc.
- Has small communication overhead
- But needs frequent querying, otherwise
information gets out-of-date and can lead to bad
decisions
19(No Transcript)
20The push model
- Each resource has a daemon for collecting local
state information to be sent to a central
scheduler ? to be saved in a database with
records on each resource activity - Frequent updates can keep more accurate views but
are intrusive in the database and network traffic
21(No Transcript)
22The push-pull model
- Each resource has a local daemon to collect state
information to be sent to intermediate nodes
aggregators that merge state information from
multiple sub-systems. - The scheduler makes queries to the aggregators
asking about resource informations - Issues
- what is the useful information
- how often must be collected
- how long should be kept in the system
23(No Transcript)
242- Resource selection
- From the available resources, select the
resources that best fit the user/application
constraints (CPU, Mem, disk, etc) to run a
submitted job. - Identifies a list of resources satisfying the
minimal requirements to run the job
253- Schedule generation
- a) select resources
- identify the best resources to run the job
- a resource selection algorithm analyses the
current state of resources and selects the best
based on a quantitative evaluation - -- random selection?
- -- eg of an algorithm based on
- EvalResource (EvalCPU EvalRAM) /
(WCPUWRAM) - EvalCPU WCPU (1-CPUload) (CPUspeed /
CPUmin) - EvalRAM WRAM(1-RAMusage) (RAMsize /
RAMmin) - b) select jobs
26b) select jobs
- Select a job from a job queue for execution
- Possible strategies
- FCFS follows submission order.
- if no R available for this job ? scheduler
waits ? all other jobs wait !! - -- R can be badly used
- -- possibly affects high priority jobs
- Random selects next job randomly from the job
queue. - -- can be unfair
- Priority-based a job priority can be set on job
submission- - -- difficult to define criteria for job
priorities - Backfilling requires knowledge on expected
execution time of a job to be scheduled.
274- Job execution
- Prepare job for execution.
28Case studies
29Condor
- A resource management and job scheduling system
(from Univ. Wisconsin-Madison, US) - Runs on a diversity of hardware/OS platforms
- - HP with HPUX
- - Sun SPARC with Solaris
- - SGI with IRIX
- - Intel x86 with Linux, Windows
- - ALPHA with Unix, Linux
- - PowerPC with Mac OS X and AIX
- -Itanium with Redhat
- Supports heterogeneous pool of Unix and Windows
nodes. - Job launched from Unix can run on Unix or Windows
nodes - and vice-versa
30Condor pools
- Resources organised as Condor pools
- Pool an administrated domain of hosts (can be
shared with other execution environments) - A system can have multiple pools.
- Each pool has a flat node organisation.
31Architecture of a Condor pool
32Condor pool
- one central manager (Master Host) manages
resources and jobs - an arbitrary number of execution (worker) hosts
- - each execution host can be configured as
- -- a job execution host
- -- a job submission host
- -- both
- On failure of the central manager
- -- currently executing jobs are not affected
- -- queued jobs are not affected in the queue
but cannot start until restarting the manager
33Daemons in the Condor pool
- - daemons run in background
- a) condor_master
- runs on each host
- spawns other daemons (condor_startd,
condor_schedd) - periodically checks if any new binaries are
installed for any of these daemons and restarts
them if needed - if any daemon crashes the master sends an
email to the admin of the Condor pool and tries
to restart the daemon - also supports management commands
- to allow admin. start, stop,
reconfigure daemons remotely
34 b) condor_startd
- runs on each host
- advertises information on the node resources for
the condor_collector daemons (running on the
Master host) for matching pending resource
requests - enforces policies imposed by resource owners to
control - conditions to start, suspend, resume, or
kill remote jobs - when is ready to execute a Condor job on an
Execution host, it launches the condor_starter
35c) condor_starter
- only runs on Execution hosts
- it actually spawns a remote Condor job on a
given host in the pool - it sets up the execution environment and
monitors the job during execution - when a job completes
- the condor_starter sends back status to the
job submission node and exits
36d) condor_schedd
- runs on each host
- handles resource requests
- user jobs submitted to a node are stored in a
local job queue managed by this daemon - command-line tools as
- condor_submit, condor_q, condor_run
- interact with this daemon to access information
on the job queue - advertises the job requests with resource
requirements in its local job queue to the
condor_collector on the Master host - once a job request from a condor_schedd on a
submission host is matched with a resource on a
Execution host - it spawns a condor_shadow on the
submission host to serve that particular job
request
37e) condor_shadow
- only runs in submission hosts
- acts as the resource manager for user job
submission requests - does remote system calls for checkpointing jobs
submitted - any system call done on a remote execution host
is sent back to this daemon on the submission
host and results also sent to it. - also decides on
- where job checkpoint files should be stored
- how certain job files should be accessed
38f) condor_collector
- only runs on the Central Manager host
- interacts with condor_startd and _schedd on
other hosts, to collect status info about a
Condor pool such as - job requests and resources available
- command-line condor_status can query this daemon
for status information
39f) condor_negotiator
- only runs on the Central Manager host
- is responsible for matching a resource with a
specific job request - periodically starts a negotiation cycle
queries the _collector for current state of all
available resources - interacts with each _schedd running on
each submission host that has resource requests
in a priority order - and tries to match available resources
with such requests - can preempt a low priority running user job
to enable running a higher priority user job
40g) condor_kbdd
- only runs on an Execution host
- to detect user console activity keyboard or
mouse - and send information to the condor_startd for it
to know a user machine owner is using the machine
again - allowing policies to decide if the job should be
stopped.
41h) condor_ckpt_server
- runs on a checkpoint server ie an Execution host
- to store and retrieve chekpointed files
- if a checkpoint server is down, Condor will send
the checkpointed files for a given job back to
the job submission host
42(No Transcript)
43Job life cycle in Condor
- Job submission by a Submission host with a
condor_submit command - Job request advertising on getting job request,
the _schedd on the Submission host advertises it
to the _collector on the Central Manager host - Resource advertising each _startd running on an
Execution host advertises resources available to
the _collector - Resource matching the _negotiator running on the
Central manager periodically queries the
_collector to match a resource for a user job
request. Then it informs _schedd on the
Submission host about the mached execution host - Job execution the _schedd informs the _startd on
the matched Execution host to spawn a _starter
there, and also launches a _shadow on the
Submission host to interact with the _started for
job execution control. The _starter gets a User
job to execute
44Job life cycle in Condor (cont.)
- 6. Return output when a job is completed, the
results will be sent back to the Submission host
by the interaction between the _shadow and the
_starter-
45(No Transcript)
46Security management in Condor
- strong support for authentication, encryption,
integrity assurance and authorization - when installing Condor
- nothing is ensured in the default
configuration settings - an admin uses configuration macros to enable
such features - a) authorization
- protects resource usage by granting/denying
access requests made to the resources - defines who is allowed to do what
- is granted based on specific access levels
- (eg READ permission to view status of pool, WRITE
permission to submit a job)
47- b) authentication
- provides an assurance of an identity
- via macros, both a client and a daemon can
specyfy of authentication is required - Eg if the config file for a daemon has
- SEC_WRITE_AUTHENTICATION REQUIRED
- or SEC_DEFAULT_AUTHENTICATION REQUIRED
- If no authentication methods are specified in the
configuration Condor uses a default from Globus
GSI authentication with x.509 certificates,
Kerberos authentication or file system
authentication.
48- c) encryption
- provides privacy support between two
communicating parties. - d) Integrity checks
- assures the messages between communicating
parties have not been modified by detecting any
change.
49Job management in Condor
- Job a work unit submitted to a Condor pool for
execution - Job types executable sequential or parallel
codes - -- may be a long running job
- -- a periodically runnable job
- -- a parallel job in multiple machines
- Queue
- a job queue in each Submission host is managed
by the _sched - a job in a queue can be removed and put on hold
- Job status
- Idle no activity
- busy running
- suspended
- vacating currently checkpointing
- killing currently being killed
- benchmarking via _startd
50Job run-time environments
- Condor Universe specifies a Condor execution
environment - Examples
- a) Default Standard Universe for a job that was
relinked with condor_compile with Condor libs
(supports checkp remote sys calls) - b) Vanilla Universe for jobs not linked with
Condor libs to submit shell scripts to Condor - c) PVM Universe for a parallel job in PVM
- d) MPI Universe for MPI in the MPICH
- e) Java Universe for Java programs
- f) Globus Universe interface for starting
Globus jobs from Condor each job queued in the
job submission file is translated into Globus RSL
and submitted to Globus via GRAM protocol - g) Scheduler Universe execute job on its
submission host
51Job submission with a shared FS
- If jobs are submitted without using the file
transfer mechanism - Condor must use a shared FS to access input
and output files - Then the job must be able to access the data
files from any machine on which it could
potentially run
52Job submission without a shared FS
- if a job is submitted using the file transfer
mechanism in Condor - then any needed files will be transferred from
the submission host to a temp working directory
on the execution host - after execution, output files will be
transferred back to the submission host - user specifies in the job submission description
file - which files to transfer
- at what point the output files should be
copied back into the submission host
53Job priorities
- Allows assign a priority level to each submitted
job - And it can be changed during execution
54Job flow management
- Condor uses a DAG to represent a set of tasks in
a job submission - Condor finds the hosts for execution of the
tasks but does not schedule the tasks in terms of
dependencies - For that purpose there is DAGMan a
meta-scheduler for Condor jobs that submits jobs
to Condor at an order represented by a DAG and
processes the results - an input file is used to describe the
dependencies of the tasks involved in the DAG
and each task in the DAG also has its - own description file
55Job monitoring
- Using the condor_q
- can monitor the status of a job
- and by inspecting the log files managed by
DAGMan - or by using condor_q-dag
56Job recovery the Rescue DAG
- When a node fails, computation of the job DAG
proceeds until dependencies do not allow it. - An uncompleted DAG is then saved in a file so
that when restarting, completed nodes do not have
to be repeated.
57Job checkpointing
- For long running jobs, gives fault-tolerance
- It takes a snapshot of current state of a job to
allow restarting it - Allows Condor to reconsider scheduling decisions
via preemptive-resume scheduling - if the scheduler decides to deallocate a
host to a job - (eg when host owner gets back to work)
- it can checkpoint the job and preempt it
without loosing work - already done
- the job can be resumed later when the
scheduler allocates - it to a new host
58Computing on demand
- Extends Condor for running short-term jobs on
available resources immediately - for interactive computation-intensive jobs
59Flocking
- a Condor job submitted in a pool can get
executed in another pool via configuration the
_schedd can support job flocking
60Resource management in Condor
- Tracking resource usage
- _startd on each host reports to the
_collector about Resources available on that host - User priority
- Job Scheduling policies
- to avoid large jobs from taking the resources,
a up-down strategy changes the job priorities
inversely to the number of cycles required - and uses
- fcfs by default
- preemptive scheduling of low priority jobs
- dedicated scheduling with no preemption
61Resource matching in Condor
- to match an execution host to run a selected job
or jobs - _collector receives job request advertisements
from _schedd on each submission host and - receives resource advertisements
from _startd on each - execution host
- --- a resource match is done by the _negotiator
by selecting a resource based on the job
requirements - both advertisements are described in Condor
Classified Advertisement language (ClassAd)
representing the - characteristics and constraints of hosts and
jobs
62ClassAd
- Is a set of uniquely named expressions, each
called an attribute - MyTypejob
- TargetTypemachine
- ((other.Arch Intel
- other.OS Linux)
- Other.Disk gt my.DiskUsage)
- ...
- Includes a query language
63Condor support in Globus
- Jobs can be submitted directly to a Condor pool
from a Condor host or via Globus - by configuring the Globus host with Condor
jobmanager included in Globus - jobs are submitted to Globus via
globus_job_run - but are redirected to Condor via
condor_submit
64(No Transcript)
65Condor-G
- version of Condor to maintain interaction with a
Globus gatekeeper submitting and monitoring jobs
to Globus - allows job descriptions similar to Condor to be
run under Globus grid resources - Condor-G is the job management part of Condor
66(No Transcript)
67SGE Sun Grid Engine
- A distributed resource management and scheduling
system for Unix environments - to find and manage a pool of resources and
schedule jobs - Is an open-source project
- Architecture
- master host a single host handles all
requests from users, job scheduling decisions and
job dispatching to execution hosts - submit host machines configured to submit,
monitor and manage jobs, and the cluster - execution host permission to run SGE jobs
- admin host for configurations of the cluster
- shadow master host monitors the master and
assumes control if the master fails. Current jobs
not affected by the failure. -
-
68(No Transcript)
69Daemons
- sge_qmaster central manager keeps tables about
hosts, queries, jobs, system load and user
permissions - gets scheduling decisions from _schedd
- asks actions to _execd on the execution
hosts - runs on the master host
- sge_schedd keeps an up-to-date view of the
cluster status - makes scheduling decisions on which jobs to
dispatch to which - queues
- forwards the decisions to the _qmaster
- runs on the master host
- sge_execd keeps the queues on its host and job
execution - periodically forwards info on job status and
host load to the _qmaster - runs on each execution host
70- sge_commd handles communications among SGE
components using a well-know TCP port - runs on each execution host and on the master
host - sge_shepherd started by the _execd, this daemon
runs for each job under execution on a host,
controls the process execution and collects
accounting data on job completion
71(No Transcript)
72Job management in SGE
- Job types
- batch, interactive parallel and array ( a job
can be replicated n times with distinct input
data sets (for parameter sweep studies)) - Submitted jobs are put into job queues
- A SGE queue is a container for a class of jobs
allowed to execute on a specific host
concurrently - a queue determines certain job attributes eg
if it can migrate - A job is associated with a queue actions on the
queue affect all its jobs eg suspend all jobs in
a queue - Job submission by a user only gives the
requirements profile (memory, OS, software) --gt
SGE dispatches to a suitable queue on a lightly
loaded host - If a job is submitted to a specific queue bound
to that queue and its host
73Job run-time environments in SGE
- Three execution modes are supported
- batch for sequential programs
- interactive gives user shell access
command-line oriented to some suitable host - parallel uses PVM or MPI environments
74Job selection and RM in SGE
- Jobs submitted to the master are kept in a
spooling area until _schedd decides that the job
is ready to run - available resources are matched with the job
requirements - eg available memory, CPU speed, available
software licences - (info periodically collected by the execution
hosts) - On sucessful matching, higher priority jobs are
dispatched first - Scheduling criteria (besides other urgent
resource reservation) - a) job priorities
- a fifo rule is used by default, with all
pending jobs in an ordered list by submission
order - if a suitable queue is available for
the head, it is dispatched - independent of that, it tries to
dispatch the 2nd job, etc - a priority defined by the admin can modify the
fifo order, the pending job list is ordered on
priorities - b) equal share
75Equal-share scheduling
- If a series of jobs is submitted at almost the
same time, they would be put in the same group of
queues and would wait long to execute - equal-sharing tries to avoid this
- by sorting the jobs of a user with a currently
executing job, - puts the new jobs of the same priority in the
end of the list - -------
- Jobs can be directly submitted to
- a SGE cluster
- or via Globus
76(No Transcript)
77Conclusions
- Condor and SGE are single administrative domain
RMS and Scheduling - but can be interconnected across administrative
boundaries using Globus - Common aspects
- master-worker based
- one master host central manager per system
- arbitrary number of worker machines used for
job submission, job execution or both - centralized scheduling
- priority-based job scheduling
- support batch jobs
- a diversity of platforms
- support authentication and authorization
78- Availability free downloaded
- Windows support Condor partial support, SGE
only Unix - GUI support Condor is command-line oriented but
has some graphic tools (Condorview graphical
history of resources in the pool
CondorUserLogViewer graphical history of a set of
jobs submitted) SGE has GUI - Jobs supported all support batch and parallel
jobs using MPI and PVM. Condor does not support
interactive jobs. SGE does. - Resource reservation both job checkpointing and
fault recovery - Job flocking Condor allows a job to migrate to
another cluster - Job scheduling both preemptive scheduling SGE
supports deadline constraint scheduling - Resource matching both
- Job flow management both support inter-job
dependency descriptions for complex applications
79Grid scheduling with QoS
- Condor and SGE lack support for QoS in
scheduling. - Aspects to be considered
- job characteristics
- market.based scheduling models
- planning in scheduling
- rescheduling
- scheduling optimization
- performance prediction
- Eg, AppLeS adaptive application-leve scheduling
system - measures the performance of the application on
a specific site resource and uses this to make
resource selection and scheduling decisions. For
master-slave applications.
80AppLeS
81AppLeS
- Components
- Network Weather Service dynamic gathers info of
system state and forecasts of resource loads - User specifications info about user criteria
for performance, execution constraints, and other - Model a repository of default models,
originated by similar classes of applications
that can be used for performance estimation,
planning and resource selection. - Resource selector choose and filter different
resource combination - Planner generate a description of a
resource-dependent schedule from a given resource
combination - Performance estimator generate an estimate for
candidate schedules according to users
performance metric - Coordinator chooses the best schedule
- Actuator implements the decided schedule on the
target system
82Steps in using AppLeS
- 1. User specifies a Heterogenous Application
Template with info on the structure,
characteristics and constraints - 2. Coordinator uses this to filter out impossible
schedules - 3. Resource selector identifies possible sets of
resources and prioritizes them based on a logical
distance between resources - 4. Planner defines a potential schedule for each
viable resource configuration - 5. Performance estimator evaluates such schedule
in terms of user performance - 6. Coordinator chooses the best schedule and pass
it to the actuator.
83GrADS
84(No Transcript)