Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li

About This Presentation

Title:

Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li

Description:

Runs on a diversity of hardware/OS platforms: - HP with HPUX - Sun SPARC with Solaris - SGI with IRIX - Intel x86 with Linux, Windows - ALPHA with Unix, Linux ... – PowerPoint PPT presentation

Number of Views:137

Avg rating:3.0/5.0

Slides: 85

Provided by: ascDiF

Category:

more less

Transcript and Presenter's Notes

Title: Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li

1
Grid Scheduling and Resource Managementbaseado
no Cap 6 do livro The Grid Core Technologies, M.
Li M. Baker, J, Wiiey, 2005
2

Grid scheduling
Mapping Grid jobs to resources over multiple
admin. domains
A Grid job can be split in many tasks the
scheduler must
select resources
schedule tasks
to meet user/application requirements
(global exec. time and cost of used resources)

3
Scheduling paradigms

Centralized scheduling
Distributed scheduling
Hierarchical scheduling

4
Centralized scheduling

A central node is the resource manager to
schedule jobs to all known nodes
Practical use in computing centres where
resources have similar characteristics and usage
policies
Pending jobs wait in a central job queue until
dispatched by the central scheduler

5
(No Transcript)
6

Advantages
Good scheduling decisions?
- access to all needed information, up-to-date
about available resources
Disadvantages
does not scale well when resource pool increases
scheduler is a bottleneck
single point of failure

7
Distributed scheduling

No central scheduler.
Multiple local schedulers cooperate to dispatch
jobs to the nodes
Two approaches
with direct communication among schedulers
Indirect communication

Advantages
scalability
better fault-tolerance and reliability
Disadvantages
no global scheduler ? may lead to sub-optimal
scheduling decisions

9
Direct communication

Each scheduler has
a list of remote schedulers with which it
communicates for job dispatching
Or
there is a central information directory with
information on each scheduler

10
(No Transcript)
11

If a job cannot be dispatched via its local job
queue, other schedulers are contacted to find
appropriate resources

12
Indirect communication via a job pool

Jobs that cannot execute immediately locally, are
sent to a central job pool.
Schedulers can select suitable jobs to run on
their resources.
Policies must ensure all jobs eventually get
executed.

13
(No Transcript)
14
Hierarchical scheduling

A central scheduler interacts with local
schedulers for job submission as a kind of
meta-scheduler dispatching submitted jobs to the
local schedulers
Can have scalability, bottleneck, and
fault-tolerance problems
But allows different policies for job scheduling
from the global and local schedulers

15
(No Transcript)
16
Scheduling operations

4 main stages
Resource discovery
Resource selection
Schedule generation
Job execution

17
1- Resource discovery

Goal Identify a list of authenticated resources
available for job submission and execution
Needs to consider dynamic changes by deciding
depending on dynamic state information on the
available resources, and by online revising the
decisions.
Eg like a compilerthat schedules machine
instructions to minimize resource idle time
Need to know
what resource are accessible
how busy they are
how long to communicate with them
how long to communicate between them
To decide on more efficient resource allocation.
Typical models
pull
push
push-pull

18
The pull model

A daemon associated with the scheduler is
responsible for querying Grid resources to get
state information on
CPU loads,
available memory,
etc.
Has small communication overhead
But needs frequent querying, otherwise
information gets out-of-date and can lead to bad
decisions

19
(No Transcript)
20
The push model

Each resource has a daemon for collecting local
state information to be sent to a central
scheduler ? to be saved in a database with
records on each resource activity
Frequent updates can keep more accurate views but
are intrusive in the database and network traffic

21
(No Transcript)
22
The push-pull model

Each resource has a local daemon to collect state
information to be sent to intermediate nodes
aggregators that merge state information from
multiple sub-systems.
The scheduler makes queries to the aggregators
asking about resource informations
Issues
what is the useful information
how often must be collected
how long should be kept in the system

23
(No Transcript)
24
2- Resource selection

From the available resources, select the
resources that best fit the user/application
constraints (CPU, Mem, disk, etc) to run a
submitted job.
Identifies a list of resources satisfying the
minimal requirements to run the job

25
3- Schedule generation

a) select resources
identify the best resources to run the job
a resource selection algorithm analyses the
current state of resources and selects the best
based on a quantitative evaluation
-- random selection?
-- eg of an algorithm based on
EvalResource (EvalCPU EvalRAM) /
(WCPUWRAM)
EvalCPU WCPU (1-CPUload) (CPUspeed /
CPUmin)
EvalRAM WRAM(1-RAMusage) (RAMsize /
RAMmin)
b) select jobs

26
b) select jobs

Select a job from a job queue for execution
Possible strategies
FCFS follows submission order.
if no R available for this job ? scheduler
waits ? all other jobs wait !!
-- R can be badly used
-- possibly affects high priority jobs
Random selects next job randomly from the job
queue.
-- can be unfair
Priority-based a job priority can be set on job
submission-
-- difficult to define criteria for job
priorities
Backfilling requires knowledge on expected
execution time of a job to be scheduled.

27
4- Job execution

Prepare job for execution.

28
Case studies
29
Condor

A resource management and job scheduling system
(from Univ. Wisconsin-Madison, US)
Runs on a diversity of hardware/OS platforms
- HP with HPUX
- Sun SPARC with Solaris
- SGI with IRIX
- Intel x86 with Linux, Windows
- ALPHA with Unix, Linux
- PowerPC with Mac OS X and AIX
-Itanium with Redhat
Supports heterogeneous pool of Unix and Windows
nodes.
Job launched from Unix can run on Unix or Windows
nodes
and vice-versa

30
Condor pools

Resources organised as Condor pools
Pool an administrated domain of hosts (can be
shared with other execution environments)
A system can have multiple pools.
Each pool has a flat node organisation.

31
Architecture of a Condor pool
32
Condor pool

one central manager (Master Host) manages
resources and jobs
an arbitrary number of execution (worker) hosts
- each execution host can be configured as
-- a job execution host
-- a job submission host
-- both
On failure of the central manager
-- currently executing jobs are not affected
-- queued jobs are not affected in the queue
but cannot start until restarting the manager

33
Daemons in the Condor pool

- daemons run in background
a) condor_master
runs on each host
spawns other daemons (condor_startd,
condor_schedd)
periodically checks if any new binaries are
installed for any of these daemons and restarts
them if needed
if any daemon crashes the master sends an
email to the admin of the Condor pool and tries
to restart the daemon
also supports management commands
to allow admin. start, stop,
reconfigure daemons remotely

34
b) condor_startd

runs on each host
advertises information on the node resources for
the condor_collector daemons (running on the
Master host) for matching pending resource
requests
enforces policies imposed by resource owners to
control
conditions to start, suspend, resume, or
kill remote jobs
when is ready to execute a Condor job on an
Execution host, it launches the condor_starter

35
c) condor_starter

only runs on Execution hosts
it actually spawns a remote Condor job on a
given host in the pool
it sets up the execution environment and
monitors the job during execution
when a job completes
the condor_starter sends back status to the
job submission node and exits

36
d) condor_schedd

runs on each host
handles resource requests
user jobs submitted to a node are stored in a
local job queue managed by this daemon
command-line tools as
condor_submit, condor_q, condor_run
interact with this daemon to access information
on the job queue
advertises the job requests with resource
requirements in its local job queue to the
condor_collector on the Master host
once a job request from a condor_schedd on a
submission host is matched with a resource on a
Execution host
it spawns a condor_shadow on the
submission host to serve that particular job
request

37
e) condor_shadow

only runs in submission hosts
acts as the resource manager for user job
submission requests
does remote system calls for checkpointing jobs
submitted
any system call done on a remote execution host
is sent back to this daemon on the submission
host and results also sent to it.
also decides on
where job checkpoint files should be stored
how certain job files should be accessed

38
f) condor_collector

only runs on the Central Manager host
interacts with condor_startd and _schedd on
other hosts, to collect status info about a
Condor pool such as
job requests and resources available
command-line condor_status can query this daemon
for status information

39
f) condor_negotiator

only runs on the Central Manager host
is responsible for matching a resource with a
specific job request
periodically starts a negotiation cycle
queries the _collector for current state of all
available resources
interacts with each _schedd running on
each submission host that has resource requests
in a priority order
and tries to match available resources
with such requests
can preempt a low priority running user job
to enable running a higher priority user job

40
g) condor_kbdd

only runs on an Execution host
to detect user console activity keyboard or
mouse
and send information to the condor_startd for it
to know a user machine owner is using the machine
again
allowing policies to decide if the job should be
stopped.

41
h) condor_ckpt_server

runs on a checkpoint server ie an Execution host
to store and retrieve chekpointed files
if a checkpoint server is down, Condor will send
the checkpointed files for a given job back to
the job submission host

42
(No Transcript)
43
Job life cycle in Condor

Job submission by a Submission host with a
condor_submit command
Job request advertising on getting job request,
the _schedd on the Submission host advertises it
to the _collector on the Central Manager host
Resource advertising each _startd running on an
Execution host advertises resources available to
the _collector
Resource matching the _negotiator running on the
Central manager periodically queries the
_collector to match a resource for a user job
request. Then it informs _schedd on the
Submission host about the mached execution host
Job execution the _schedd informs the _startd on
the matched Execution host to spawn a _starter
there, and also launches a _shadow on the
Submission host to interact with the _started for
job execution control. The _starter gets a User
job to execute

44
Job life cycle in Condor (cont.)

6. Return output when a job is completed, the
results will be sent back to the Submission host
by the interaction between the _shadow and the
_starter-

45
(No Transcript)
46
Security management in Condor

strong support for authentication, encryption,
integrity assurance and authorization
when installing Condor
nothing is ensured in the default
configuration settings
an admin uses configuration macros to enable
such features
a) authorization
protects resource usage by granting/denying
access requests made to the resources
defines who is allowed to do what
is granted based on specific access levels
(eg READ permission to view status of pool, WRITE
permission to submit a job)

b) authentication
provides an assurance of an identity
via macros, both a client and a daemon can
specyfy of authentication is required
Eg if the config file for a daemon has
SEC_WRITE_AUTHENTICATION REQUIRED
or SEC_DEFAULT_AUTHENTICATION REQUIRED
If no authentication methods are specified in the
configuration Condor uses a default from Globus
GSI authentication with x.509 certificates,
Kerberos authentication or file system
authentication.

c) encryption
provides privacy support between two
communicating parties.
d) Integrity checks
assures the messages between communicating
parties have not been modified by detecting any
change.

49
Job management in Condor

Job a work unit submitted to a Condor pool for
execution
Job types executable sequential or parallel
codes
-- may be a long running job
-- a periodically runnable job
-- a parallel job in multiple machines
Queue
a job queue in each Submission host is managed
by the _sched
a job in a queue can be removed and put on hold
Job status
Idle no activity
busy running
suspended
vacating currently checkpointing
killing currently being killed
benchmarking via _startd

50
Job run-time environments

Condor Universe specifies a Condor execution
environment
Examples
a) Default Standard Universe for a job that was
relinked with condor_compile with Condor libs
(supports checkp remote sys calls)
b) Vanilla Universe for jobs not linked with
Condor libs to submit shell scripts to Condor
c) PVM Universe for a parallel job in PVM
d) MPI Universe for MPI in the MPICH
e) Java Universe for Java programs
f) Globus Universe interface for starting
Globus jobs from Condor each job queued in the
job submission file is translated into Globus RSL
and submitted to Globus via GRAM protocol
g) Scheduler Universe execute job on its
submission host

51
Job submission with a shared FS

If jobs are submitted without using the file
transfer mechanism
Condor must use a shared FS to access input
and output files
Then the job must be able to access the data
files from any machine on which it could
potentially run

52
Job submission without a shared FS

if a job is submitted using the file transfer
mechanism in Condor
then any needed files will be transferred from
the submission host to a temp working directory
on the execution host
after execution, output files will be
transferred back to the submission host
user specifies in the job submission description
file
which files to transfer
at what point the output files should be
copied back into the submission host

53
Job priorities

Allows assign a priority level to each submitted
job
And it can be changed during execution

54
Job flow management

Condor uses a DAG to represent a set of tasks in
a job submission
Condor finds the hosts for execution of the
tasks but does not schedule the tasks in terms of
dependencies
For that purpose there is DAGMan a
meta-scheduler for Condor jobs that submits jobs
to Condor at an order represented by a DAG and
processes the results
an input file is used to describe the
dependencies of the tasks involved in the DAG
and each task in the DAG also has its
own description file

55
Job monitoring

Using the condor_q
can monitor the status of a job
and by inspecting the log files managed by
DAGMan
or by using condor_q-dag

56
Job recovery the Rescue DAG

When a node fails, computation of the job DAG
proceeds until dependencies do not allow it.
An uncompleted DAG is then saved in a file so
that when restarting, completed nodes do not have
to be repeated.

57
Job checkpointing

For long running jobs, gives fault-tolerance
It takes a snapshot of current state of a job to
allow restarting it
Allows Condor to reconsider scheduling decisions
via preemptive-resume scheduling
if the scheduler decides to deallocate a
host to a job
(eg when host owner gets back to work)
it can checkpoint the job and preempt it
without loosing work
already done
the job can be resumed later when the
scheduler allocates
it to a new host

58
Computing on demand

Extends Condor for running short-term jobs on
available resources immediately
for interactive computation-intensive jobs

59
Flocking

a Condor job submitted in a pool can get
executed in another pool via configuration the
_schedd can support job flocking

60
Resource management in Condor

Tracking resource usage
_startd on each host reports to the
_collector about Resources available on that host
User priority
Job Scheduling policies
to avoid large jobs from taking the resources,
a up-down strategy changes the job priorities
inversely to the number of cycles required
and uses
fcfs by default
preemptive scheduling of low priority jobs
dedicated scheduling with no preemption

61
Resource matching in Condor

to match an execution host to run a selected job
or jobs
_collector receives job request advertisements
from _schedd on each submission host and
receives resource advertisements
from _startd on each
execution host
--- a resource match is done by the _negotiator
by selecting a resource based on the job
requirements
both advertisements are described in Condor
Classified Advertisement language (ClassAd)
representing the
characteristics and constraints of hosts and
jobs

62
ClassAd

Is a set of uniquely named expressions, each
called an attribute
MyTypejob
TargetTypemachine
((other.Arch Intel
other.OS Linux)
Other.Disk gt my.DiskUsage)
...
Includes a query language

63
Condor support in Globus

Jobs can be submitted directly to a Condor pool
from a Condor host or via Globus
by configuring the Globus host with Condor
jobmanager included in Globus
jobs are submitted to Globus via
globus_job_run
but are redirected to Condor via
condor_submit

64
(No Transcript)
65
Condor-G

version of Condor to maintain interaction with a
Globus gatekeeper submitting and monitoring jobs
to Globus
allows job descriptions similar to Condor to be
run under Globus grid resources
Condor-G is the job management part of Condor

66
(No Transcript)
67
SGE Sun Grid Engine

A distributed resource management and scheduling
system for Unix environments
to find and manage a pool of resources and
schedule jobs
Is an open-source project
Architecture
master host a single host handles all
requests from users, job scheduling decisions and
job dispatching to execution hosts
submit host machines configured to submit,
monitor and manage jobs, and the cluster
execution host permission to run SGE jobs
admin host for configurations of the cluster
shadow master host monitors the master and
assumes control if the master fails. Current jobs
not affected by the failure.

68
(No Transcript)
69
Daemons

sge_qmaster central manager keeps tables about
hosts, queries, jobs, system load and user
permissions
gets scheduling decisions from _schedd
asks actions to _execd on the execution
hosts
runs on the master host
sge_schedd keeps an up-to-date view of the
cluster status
makes scheduling decisions on which jobs to
dispatch to which
queues
forwards the decisions to the _qmaster
runs on the master host
sge_execd keeps the queues on its host and job
execution
periodically forwards info on job status and
host load to the _qmaster
runs on each execution host

sge_commd handles communications among SGE
components using a well-know TCP port
runs on each execution host and on the master
host
sge_shepherd started by the _execd, this daemon
runs for each job under execution on a host,
controls the process execution and collects
accounting data on job completion

71
(No Transcript)
72
Job management in SGE

Job types
batch, interactive parallel and array ( a job
can be replicated n times with distinct input
data sets (for parameter sweep studies))
Submitted jobs are put into job queues
A SGE queue is a container for a class of jobs
allowed to execute on a specific host
concurrently
a queue determines certain job attributes eg
if it can migrate
A job is associated with a queue actions on the
queue affect all its jobs eg suspend all jobs in
a queue
Job submission by a user only gives the
requirements profile (memory, OS, software) --gt
SGE dispatches to a suitable queue on a lightly
loaded host
If a job is submitted to a specific queue bound
to that queue and its host

73
Job run-time environments in SGE

Three execution modes are supported
batch for sequential programs
interactive gives user shell access
command-line oriented to some suitable host
parallel uses PVM or MPI environments

74
Job selection and RM in SGE

Jobs submitted to the master are kept in a
spooling area until _schedd decides that the job
is ready to run
available resources are matched with the job
requirements
eg available memory, CPU speed, available
software licences
(info periodically collected by the execution
hosts)
On sucessful matching, higher priority jobs are
dispatched first
Scheduling criteria (besides other urgent
resource reservation)
a) job priorities
a fifo rule is used by default, with all
pending jobs in an ordered list by submission
order
if a suitable queue is available for
the head, it is dispatched
independent of that, it tries to
dispatch the 2nd job, etc
a priority defined by the admin can modify the
fifo order, the pending job list is ordered on
priorities
b) equal share

75
Equal-share scheduling

If a series of jobs is submitted at almost the
same time, they would be put in the same group of
queues and would wait long to execute
equal-sharing tries to avoid this
by sorting the jobs of a user with a currently
executing job,
puts the new jobs of the same priority in the
end of the list
-------
Jobs can be directly submitted to
a SGE cluster
or via Globus

76
(No Transcript)
77
Conclusions

Condor and SGE are single administrative domain
RMS and Scheduling
but can be interconnected across administrative
boundaries using Globus
Common aspects
master-worker based
one master host central manager per system
arbitrary number of worker machines used for
job submission, job execution or both
centralized scheduling
priority-based job scheduling
support batch jobs
a diversity of platforms
support authentication and authorization

Availability free downloaded
Windows support Condor partial support, SGE
only Unix
GUI support Condor is command-line oriented but
has some graphic tools (Condorview graphical
history of resources in the pool
CondorUserLogViewer graphical history of a set of
jobs submitted) SGE has GUI
Jobs supported all support batch and parallel
jobs using MPI and PVM. Condor does not support
interactive jobs. SGE does.
Resource reservation both job checkpointing and
fault recovery
Job flocking Condor allows a job to migrate to
another cluster
Job scheduling both preemptive scheduling SGE
supports deadline constraint scheduling
Resource matching both
Job flow management both support inter-job
dependency descriptions for complex applications

79
Grid scheduling with QoS

Condor and SGE lack support for QoS in
scheduling.
Aspects to be considered
job characteristics
market.based scheduling models
planning in scheduling
rescheduling
scheduling optimization
performance prediction
Eg, AppLeS adaptive application-leve scheduling
system
measures the performance of the application on
a specific site resource and uses this to make
resource selection and scheduling decisions. For
master-slave applications.

80
AppLeS
81
AppLeS

Components
Network Weather Service dynamic gathers info of
system state and forecasts of resource loads
User specifications info about user criteria
for performance, execution constraints, and other
Model a repository of default models,
originated by similar classes of applications
that can be used for performance estimation,
planning and resource selection.
Resource selector choose and filter different
resource combination
Planner generate a description of a
resource-dependent schedule from a given resource
combination
Performance estimator generate an estimate for
candidate schedules according to users
performance metric
Coordinator chooses the best schedule
Actuator implements the decided schedule on the
target system

82
Steps in using AppLeS

1. User specifies a Heterogenous Application
Template with info on the structure,
characteristics and constraints
2. Coordinator uses this to filter out impossible
schedules
3. Resource selector identifies possible sets of
resources and prioritizes them based on a logical
distance between resources
4. Planner defines a potential schedule for each
viable resource configuration
5. Performance estimator evaluates such schedule
in terms of user performance
6. Coordinator chooses the best schedule and pass
it to the actuator.

83
GrADS
84
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li - PowerPoint PPT Presentation

Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li

Runs on a diversity of hardware/OS platforms: - HP with HPUX - Sun SPARC with Solaris - SGI with IRIX - Intel x86 with Linux, Windows - ALPHA with Unix, Linux ... – PowerPoint PPT presentation