Technologies for Grids and eBusiness Grid Resource Management 22'01'08 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Technologies for Grids and eBusiness Grid Resource Management 22'01'08

1
Technologies for Grids and eBusinessGrid
Resource Management22.01.08

Dr. Ramin YahyapourComputer Engineering
InstituteUniversity Dortmund

2
Resource Management on HPC Resources

HPC resources are usually parallel computers or
large scale clusters
The local resource management systems (RMS) for
such resources includes
configuration management
monitoring of machine state
job management
There is no standard for this resource
management.
Several different proprietary solutions are in
use.
Examples for job management systems
PBS, LSF, NQS, LoadLeveler, Condor

3
HPC Management Architecture in General
Control ServiceJob Master
MasterServer
Resource and JobMonitoring and Management
Services
Compute Resources/Processing Nodes
4
Computational Job

A job is a computational task
that requires processing capabilities (e.g. 64
nodes) and
is subject to constraints (e.g. a specific other
job must finish before the start of this job)
The job information is provided by the user
resource requirements
CPU architecture, number of nodes, speed
memory size per CPU
software libraries, licenses
I/O capabilities
job description
additional constraints and preferences
The format of job description is not
standardized, but usually very similar

5
Example PBS Job Description

Simple job script

whole job file is a shell script
!/bin/csh resource limits allocate needed
nodes PBS -l nodes1 resource limits
amount of memory and CPU time (hms). PBS
-l mem256mb PBS -l cput20000
path/filename for standard output PBS -o
master/mypath/myjob.out ./my-task
information for the RMS are comments
the actual job is started in the script
6
Job Submission

The user submits the job to the RMSe.g.
issuing qsub jobscript.pbs
The user can control the job
qsub submit
qstat poll status information
qdel cancel job
It is the task of the resource management system
to start a job on the required resources
Current system state is taken into account

7
PBS Structure
qsub jobscript
Job Submission
Management Server
Scheduler
8
Execution Alternatives

Time sharing
The local scheduler starts multiple processes per
physical CPU with the goal of increasing resource
utilization.
multi-tasking
The scheduler may also suspend jobs to keep the
system load under control
preemption
Space sharing
The job uses the requested resources exclusively
no other job is allocated to the same set of
CPUs.
The job has to be queued until sufficient
resources are free.

9
Job Classifications

Batch Jobs vs interactive jobs
batch jobs are queued until execution
interactive jobs need immediate resource
allocation
Parallel vs. sequential jobs
a job requires several processing nodes in
parallel
the majority of HPC installations are used to run
batch jobs in space-sharing mode!
a job is not influenced by other co-allocated
jobs
the assigned processors, node memory, caches etc.
are exclusively available for a single job.
overhead for context switches is minimized
important aspects for parallel applications

10
Preemption

A job is preempted by interrupting its current
execution
the job might be on hold on a CPU set and later
resumed job still resident on that nodes
(consumption of memory)
alternatively a checkpoint is written and the job
is migrated to another resource where it is
restarted later
Preemption can be useful to reallocate resources
due to new job submissions (e.g. with higher
priority)
or if a job is running longer then expected.

11
Job Scheduling

A job is assigned to resources through a
scheduling process
responsible for identifying available resources
matching job requirements to resources
making decision about job ordering and priorities
HPC resources are typically subject to high
utilization
therefore, resources are not immediately
available and jobs are queued for future
execution
time until execution is often quite long (many
production systems have an average delay until
execution of gt1h)
jobs may run for a long time (several hours, days
or weeks)

12
Typical Scheduling Objectives

Minimizing the Average Weighted Response Time
Maximize machine utilization/minimize idle time
conflicting objective
criteria is usually static for an installation
and implicit given by the scheduling algorithm

r submission time of a job
t completion time of a job
w weight/priority of a job

13
Job Steps
Scheduler

A user job enters a job queue,
the scheduler (its strategy) decides on start
time and resource allocation of the job.

time
Job ExecutionManagement
Schedule
lokaleJob-Queue
Grid-User
Job Description
Node Job Mgmt
Node Job Mgmt
Node Job Mgmt
HPC Machine
14
Scheduling AlgorithmsFCFS

Well known and very simple First-Come
First-Serve
Jobs are started in order of submission
Ad-hoc scheduling when resources become free
again
no advance scheduling
Advantage
simple to implement
easy to understand and fair for the users (job
queue represents execution order)
does not require a priori knowledge about job
lengths
Problems
performance can extremely degrade overall
utilization of a machine can suffer if highly
parallel jobs occur, that is, if a significant
share of nodes is requested for a single job.

15
FCFS Schedule
Scheduler
Queue
time
Time
Schedule
Job-Queue
Compute Resource
ResourcesProcssing Nodes
16
Scheduling AlgorithmsBackfilling

Improvement over FCFS
A job can be started before an earlier submitted
job if it does not delay the first job in the
queue
may still cause delay of other jobs further down
the queue
Some fairness is still maintained
Advantage
utilization is improved
Information about the job execution length is
needed
sometimes difficult to provide
user estimation not necessarily accurate
Jobs are usually terminated after exceeding its
allocated execution time
otherwise users may deliberately underestimate
the job length to get an earlier job start time

17
Backfill Scheduling

Job 3 is started before Job 2 as it does not
delay it

Scheduler
Queue
1.
time
Schedule
Time
2.
Job-Queue
3.
Compute Resource
4
ResourcesProcssing Nodes
18
Backfill Scheduling

However, if a job finishes earlier than expected,
the backfilling causes delays that otherwise
would not occur
need for accurate job length information
(difficult to obtain)

Scheduler
Queue
1.
time
Job finishes earlier!
Schedule
Time
2.
Job-Queue
3.
Compute Resource
4
ResourcesProcssing Nodes
19
Job Execution Manager

After the scheduling process,the RMS is
responsible for the job execution
sets up the execution environment for a job,
starts a job,
monitors job state, and
cleans-up after execution (copying output-files
etc.)
notifies the used (e.g. sending email)

20
Scheduling Options

Parallel job scheduling algorithms are well
studied performance is usually acceptable
Real implementations may have addition
requirements instead of need of more complex
theoretical algorithms
Prioritization of jobs, users, or groups while
maintaining fairness
Partitioning of machines
e.g. interactive and development partition vs.
production batch partitions
Combination of different queue characteristics
For instance, the Maui Scheduler is often
deployed as it is quite flexible in terms of
prioritization, backfilling, fairness etc.

21
Transition to Grid Resource Management and
Scheduling

Current state of the art

22
Transition to the Grid

More resource types come into play
Resources are any kind of entity, service or
capability to perform a specific task
processing nodes, memory, storage, networks,
experimental devices, instruments
data, software, licenses
people
The task/job/activity can also be of a broader
meaning
a job may involve different resources and
consists of several activities in a workflow with
according dependencies
The resources are distributed and may belong to
different administrative domains
HPC is still key the application for Grids.
Consequently, the main resources in a Grid are
the previously considered HPC machines with their
local RMS

23
Implications to Grid Resource Management

Several security-related issues have to be
considered authentication, authorization,accounti
ng
who has access to a certain resource?
what information can be exposed to whom?
There is lack of global information
what resources are when available for an
activity?
The resources are quite heterogeneous
different RMS in use
individual access and usage paradigms
administrative policies have to be considered

24
Scope of Grids
Cluster Grid Enterprise Grid
Global Grid
Source Ian Foster
25
Resource Management Layer

Grid Resource Management System consists of
Local resource management system (Resource Layer)
Basic resource management unit
Provide a standard interface for using remote
resources
e.g. GRAM, etc.
Global resource management system (Collective
Layer)
Coordinate all Local resource management system
within multiple or distributed Virtual
Organizations (VOs)
Provide high-level functionalities to efficiently
use all of resources
Job Submission
Resource Discovery and Selection
Scheduling
Co-allocation
Job Monitoring, etc.
e.g. Meta-scheduler, Resource Broker, etc.

26
Grid Middleware
Source Ian Foster
27
Grid Middleware (2)
Higher-Level Services
User/Application
Grid Middleware
ResourceBroker
28
Globus Grid Middleware

Globus Toolkit
common source for Grid middleware
GT2
GT3 Web/GridService-based
GT4 WSRF-based
GRAM is responsible for providing a service for a
given job specification that can
Create an environment for a job
Stage files to/from the environment
Submit a job to a local scheduler
Monitor a job
Send job state change notifications
Stream a jobs stdout/err during execution

29
Globus Job Execution

Job is described in the resource specification
language
Discover a Job Service for execution
Job Manager in Globus 2.x (GT2)
Master Management Job Factory Service (MMJFS) in
Globus 3.x (GT3)
Alternatively, choose a Grid Scheduler for job
distribution
Grid scheduler selects a job service and forwards
job to it
A Grid scheduler is not part of Globus
The Job Service prepares job for submission to
local scheduling system
If necessary, file stage-in is performed
e.g. using the GASS service
The job is submitted to the local scheduling
system
If necessary, file stage-out is performed after
job finishes.

30
Globus GT2 Execution
RSL
User/Application
Resource Broker
RSL
Specialized RSL
Resource Allocation
MDS
31
RSL

Grid jobs are described in the resource
specification language (RSL)
RSL Version 1 is used in GT2
It has an LDAP filter-like syntax that supports
boolean expressions
Example

(executable a.out) (directory
/home/nobody )(arguments arg1 "arg 2")(count
1)
32
Globus Job States
suspended
pending
stage-in
stage-out
active
done
failed
33
Globus GT3

With transition to Web/Grid-Services, the job
management becomes
the Master Managed Job Factory Service (MMJFS)
the Managed Job Factory Service (MJFS)
the Managed Job Service (MJS)
The client contacts the MMJFS
MMJFS informs the MJFS to create a MJS for the
job
The MJS takes care of managing the job actions.
interact with local scheduler
file staging
store job status

34
Globus GT3 Job Execution
User/Application
Master ManagedJob Factory Service
Managed Factory Job Service
File Streaming Factor Service
Resource InformationProvider Service
ManagedJob Service
File Streaming Service
Local Scheduler

Globus as a toolkit does not perform scheduling
and automaticresource selection

35
Example Extending the Globus Architecture at
KAIST
Job Submission Service
Client
1
Resource Selection Service
Resource Information Service
Scheduling Service
2
3
4
Job Monitoring Service
9
5
7
Local Resource Monitoring Service (RIPS)
Job Manger Service (MJS)
Resource Reservation Service
8
6
Providers
Local Resource Manager (PBS)
Resource Preference Provider
Workflow
Monitoring Information Flow
Source Jin-Soo Kim
36
Job Description with RSL2

The version 2 of RSL is XML-based
Two namespaces are used
rsl for basic types as int, string, path, url
gram for the elements of a job

GNS http//www.globus.org/namespaces lt?xml
version"1.0" encoding"UTF-8"?gt ltrslrsl
xmlnsrsl"GNS/2003/04/rsl"
xmlnsgram"GNS/2003/04/rsl/gram"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"
GNS/2003/04/rsl ./schema/base/gram/rsl.
xsd GNS/2003/04/rsl/gram
./schema/base/gram/gram_rsl.xsd"gtltgramjobgt
ltgramexecutablegtltrslpathgt
ltrslstringElement value"/bin/a.out"/gt
lt/rslpathgtlt/gramexecutablegt
lt/gramjobgt lt/rslrslgt
37
RSL2 Attributes

ltcountgt (type rslintegerType)
Number of processes to run (default is 1)
lthostCountgt (type rslintegerType)
On SMP multi-computers, number of nodes to
distribute the count processes across
count/hostCount number of processes per host
ltqueuegt (type rslstringType)
Queue into which to submit job
ltmaxWallTimegt (type rsllongType)
Maximum wall clock runtime in minutes
ltmaxCpuTimegt (type rsllongType)
Maximum CPU runtime in minutes
ltmaxTimegt (type rsllongType)
Only applies if above are not used
Maximum wall clock or cpu runtime (schedulerss
choice) in minutes

38
Job Submission Tools

GT 3 provides the Java class GramClient
GT 2.x command line programs for job submission
globus-job-run interactive jobs
globus-job-submit batch jobs
globusrun takes RSL as input

39
Globus 2 Job Client Interface
A simple job submission requiring 2 nodes
globus-job-run np 2 s myprog arg1 arg2

A multirequest specifies multiple resources for a
job

globus-job-run -dumprsl - host1 /bin/uname -a
\- host2 /bin/uname a ( (resourceManagerCont
act"host1")(subjobStartTypestrict-barrier)
(label"subjob 0")(executable"/bin/uname")
(arguments "-a") )( (resourceManagerContact"ho
st2")(subjobStartTypestrict-barrier)(label"subj
ob 1")(executable"/bin/uname") (arguments
"-a") )

40
Globus 2 Job Client Interface

The full flexibility of RSL is available through
the command line tool globusrun
Support for file staging executable and
stdin/stdout
Example

globusrun -o r hpc1.acme.com/jobmanager-pbs '(ex
ecutable(HOME)/a.out) (jobtypesingle) (queueti
me-shared)
41
Problem Job Submission Descriptions differ

The deliverables of the GGF Working Group JSDL
A specification for an abstract standard Job
Submission Description Language (JSDL) that is
independent of language bindings, including
the JSDL feature set and attribute semantics,
the definition of the relationship between
attributes,
and the range of attribute values.
A normative XML Schema corresponding to the JSDL
specification.
A document of translation tables to and from the
scheduling languages of a set of popular batch
systems for both the job requirements and
resource description attributes of those
languages, which are relevant to the JSDL.

42
JSDL Attribute Categories

The job attribute categories will include
Job Identity Attributes
ID, owner, group, project, type, etc.
Job Resource Attributes
hardware, software, including applications, Web
and Grid Services, etc.
Job Environment Attributes
environment variables, argument lists, etc.
Job Data Attributes
databases, files, data formats, and staging,
replication, caching, and disk requirements, etc.
Job Scheduling Attributes
start and end times, duration, immediate
dependencies etc.
Job Security Attributes
authentication, authorisation, data encryption,
etc.

43
Problem Resource Management Systems Differ
Across Each Component
Source Hrabri Rajic
44
GGF-WG DRMAA

GGF Working Group Distributed Resource
Management Application API
From the charter
Develop an API specification for the submission
and control of jobs to one or more Distributed
Resource Management (DRM) systems.
The scope of this specification is all the high
level functionality which is necessary for an
application to consign a job to a DRM system
including common operations on jobs like
termination or suspension.
The objective is to facilitate the direct
interfacing of applications to today's DRM
systems by application's builders, portal
builders, and Independent Software Vendors (ISVs).

45
DRMAA State Diagram

The remote job could be in following states
system hold
user hold
system and user hold simultaneously
queued active
system suspended
user suspended
system and user suspended simultaneously
running
finished (un)successfully

Source Hrabri Rajic
46
Example Condor-G

Condor-G is a Condor system enhanced to manage
Globus jobs.
It provides two main features
Globus Universe interface for submitting,
queuing and monitoring jobs that use Globus
resources
GlideInsystem for efficient execution of jobs
on remote Globus resources
Condor-G runs as a personal Condor system
daemons run as non-privileged user processes
each user runs her/his Condor-G

47
Condor-G GlideIn

Globus is used to run the Condor daemons on Grid
resources
Condor daemons run as a Globusmanaged job
GRAM service starts daemons rather than the
Condor jobs
When the resources run these GlideIn jobs, they
will join the personal Condor pool
These daemons can be used to launch a job from
Condor-G to a Globus resource
Jobs are submitted as Condor jobs and they will
be matched and run on the Grid resources
the daemons receive jobs from the users Condor
queue
combines the benefits of Globus and Condor

48
Using GlideIn
Condor-G
Grid Resource
JobManager
Schedd
LSF
GridManager
Startd
Collector
Source ANL/USC ISI
49
Example DAGMan

Directed Acyclic Graph Manager
DAGMan allows you to specify the dependencies
between your Condor-G jobs, so it can manage them
automatically for you.
(e.g., Dont run job B until job A has
completed successfully.)
A DAG is defined by a .dag file, listing each of
its nodes and their dependencies
diamond.dag
Job A a.sub
Job B b.sub
Job C c.sub
Job D d.sub
Parent A Child B C
Parent B C Child D
each node will run the Condor-G job specified by
its accompanying Condor submit file

Source Miron Livny
50
Grid Scheduling

How to select resources in the Grid?

51
Different Level of Scheduling

Resource-level scheduler
low-level scheduler, local scheduler, local
resource manager
scheduler close to the resource, controlling a
supercomputer, cluster, or network of
workstations, on the same local area network
Examples Open PBS, PBS Pro, LSF, SGE
Enterprise-level scheduler
Scheduling across multiple local schedulers
belonging to the same organization
Examples PBS Pro peer scheduling, LSF
Multicluster
Grid-level scheduler
also known as super-scheduler, broker, community
scheduler
Discovers resources that can meet a jobs
requirements
Schedules across lower level schedulers

52
Grid-Level Scheduler

Discovers selects the appropriate resource(s)
for a job
If selected resources are under the control of
several local schedulers, a meta-scheduling
action is performed
Architecture
Centralized all lower level schedulers are under
the control of a single Grid scheduler
not realistic in global Grids
Distributed lower level schedulers are under the
control of several grid scheduler components a
local scheduler may receive jobs from several
components of the grid scheduler

53
Grid Scheduling
Grid-Scheduler
Grid User
Scheduler
Scheduler
Scheduler
time
time
time
Schedule
Schedule
Schedule
Job-Queue
Job-Queue
Job-Queue
Machine 1
Machine 2
Machine 3
54
Activities of a Grid Scheduler

GGF Document 10 Actions of Super Scheduling
(GFD-I.4)

Source Jennifer Schopf
55
Grid Scheduling

A Grid scheduler allows the user to specify the
required resources and environment of the job
without having to indicate the exact location of
the resources
A Grid scheduler answers the question to which
local resource manger(s) should this job be
submitted?
Answering this question is hard
resources may dynamically join and leave a
computational grid
not all currently unused resources are available
to grid jobs
resource owner policies such as maximum number
of grid jobs allowed
it is hard to predict how long jobs will wait in
a queue

56
Select a Resource for Execution

Most systems do not provide advance information
about future job execution
user information not accurate as mentioned before
new jobs arrive that may surpass current queue
entries due to higher priority
Grid scheduler might consider current queue
situation, however this does not give reliable
information for future executions
A job may wait long in a short queue while it
would have been executed earlier on another
system.
Available information
Grid information service gives the state of the
resources and possibly authorization information
Prediction heuristics estimate jobs wait time
for a given resource, based on the current state
and the jobs requirements.

57
Selection Criteria

Distribute jobs in order to balance load across
resources
not suitable for large scale grids with different
providers
Data affinity run job on the resource where data
is located
Use heuristics to estimate job execution time.
Best-fit select the set of resources with the
smallest capabilities and capacities that can
meet jobs requirements
Quality of Service of
a resource or
its local resource management system
what features has the local RMS?
can they be controlled from the Grid scheduler?

58
Scheduling Attributes

Working Group in the Global Grid Forum to
Define the attributes of a lower-level
scheduling instance that can be exploited by a
higher-level scheduling instance.
The following attributes have been defined
Attributes of allocation properties
Revocation of an allocation
The local scheduler reserves the right to
withdraw a given allocation.
Guaranteed completion time of allocation,
A deadline for job completion is provided by the
local scheduler.
Guaranteed Number of Attempts to Complete a Job
A local scheduler will retry a given job task,
e.g. useful for data transfer actions.
Allocations run-to-completion
A job is not preempted if it has been started
Exclusive Allocations
A job has exclusive access to the given
resources e.g. no time-sharing is performed
Malleable Allocations
The given resource set may change during runtime
e.g. a computational job will gain (moldable job)
or lose processors

59
Scheduling Attributes (2)

Attributes of available information
Access to tentative schedule
The local scheduler exposes his schedule of
future allocations
Option Only the projected start time of a
specified allocation is available
Option Only partial information on the current
schedule is available
Exclusive control
The local scheduler is exclusively in charge of
the resources no other jobs can appear on the
resources
Event Notification
The local scheduler provides an event
subscription service
Attributes for manipulating allocation execution
Preemption
Checkpointing
Migration
Restart

60
Scheduling Attributes (3)

Attributes for requesting resources
Allocation Offers
The local system can provides an interface to
request offers for an allocation
Allocation Cost or Objective Information
The local scheduler can provide cost or objective
information
Advance Reservation
Allocations can be reserved in advance
Requirement for Providing Maximum Allocation
Length in Advance
The higher-level scheduler must provide a maximum
job execution length
Deallocation Policy
A policy applies for the allocation that must be
met to stay valid
Remote Co-Scheduling
A schedule can be generated by a higher-level
instance and imposed on the local scheduler
Consideration of Job Dependencies
The local scheduler can deal with dependency
information of jobs e.g. for workflows

61
CSF Community Scheduler Framework

An open source implementation of OGSA-based
metascheduler for VOs.
Supports emerging WS-Agreement spec
Supports GT GRAM
Fills in gaps in existing resource management
picture
Contributed from platform to the Globus Toolkit
Extensible, Open Source framework for
implementing meta-schedulers
Provides basic protocols and interfaces to help
resources work together in heterogeneous
environments

62
CSF Architecture
Platform LSF User
Globus Toolkit User
Meta- scheduler Plugin
LSF
Grid Service Hosting Environment
Meta-Scheduler
Global Information Service
Job Service
Reservation Service
Queuing Service
GRAM SGE
GRAM PBS
RM Adapter
RIPS
RIPS
RIPS
RIPS Resource Information Provider Service
Platform LSF
SGE
PBS
Source Chris Smith
63
Global Information Service
Rsrc Info Req
ClusterInfo Req
Data Store Req
Data Load Req
Global Information Service
Index Service
Registry
SD Aggregator
SDProvider Manager
Data Storage
Rsrc, job rsv info
Data Store Load
Job Info
Rsv Info
Cluster Register
RIPS
DB
RIPS
Source Chris Smith
64
Support for Virtual Organizations
Source Chris Smith
65
CSF Grid Services

Job Service
creates, monitors and controls compute jobs
Reservation Service
guarantees resources are available for running a
job
Queuing Service
provides a service where administrators can
customize and define scheduling policies at the
VO level and/or at the different resource manager
level
Defines an API for plug in schedulers
RM Adaptor Service
provides a Grid service interface that bridges
the Grid service protocol and resource managers
(LSF, PBS, SGE, Condor and other RMs)

66
GT3 Job Submission / Architecture
MMJFS Master Managed Job Factory Service MJS
Managed Job Service Blue indicates a Grid Service
hosted in a GT3 container
MJS for LSF
LSF
MMJFS
RIPS
Site A MMJFS on node1
managed-job-globusrun
MJS for PBS
managed-job-globusrun
PBS
MMJFS
managed-job-globusrun
RIPS
Site B MMJFS on node2
MJS for SGE
Index Service
SGE
MMJFS
RIPS
Site C MMJFS on node3
Source Chris Smith
67
GT3 CSF Architecture
Queuing Service
Job Service
Reservation Service
Index Service
Virtual Organization
Source Chris Smith
68
Queue Service

In CSF, Job Service instances are submitted to
the Queue Service for dispatch to a resource
manager.
The Queue Service provides a plug in API for
extending the scheduling algorithms provided by
default with CSF.
The Queue Service is responsible for
loads and validates configuration information
loads all configured scheduler plugins
calls the plugin API functions
schedInit() after loading the plugin successfully
schedOrder() when a new job is submitted
schedMatch() during the scheduling cycle
schedPost() before the scheduling cycle ends, and
after scheduling decisions are sent to the job
service instances

69
Example Project GridLab - GRMS
Information Services
Data Management
Authorization System
Adaptive
Resource Discovery
File Transfer Unit
Jobs Queue
BROKER
Job Receiver
Execution Unit
Monitoring
SLA Negotiation
Scheduler
Workflow Manager
Application Manager
Resource Reservation
Prediction Unit
GRMS
GLOBUS, other
Local Resources (Managers)
Source Jarek Nabrzyski
70
Anticipated Features

Reliable and predictable delivery of a service
Quality of service for a job service
Reliable job submission two-phase commit
Predictable start and end time of the job
Advance reservation assures start time and
throughput
Fault tolerance/recovery
Migrate job to another resource before the fault
occurs the job continues
after the fault the job is restarted
Rerun the job on the same resource after repair
Allocate multiple resources for a job

71
Co-allocation

It is often requested that several resources are
used for a single job.
that is, a scheduler has to assure that all
resources are available when needed.
in parallel (e.g. visualization and processing)
with time dependencies (e.g. a workflow)
The task is especially difficult if the resources
belong to different administrative domains.
The actual allocation time must be known for
co-allocation
or the different local resource management
systems must synchronize each other (wait for
availability of all resources)

72
Example Multi-Site Job Execution
Grid-Scheduler
Multi-Side Job

A job uses several resources at different sites
in parallel.
Network communication is an issue.

73
Advanced Reservation

Co-allocation and other applications require a
priori information about the precise resource
availability
With the concept of advanced reservation, the
resource provider guarantees a specified resource
allocation.
includes a two- or three-phase commit for
agreeing on the reservation
Implementations
GARA/DUROC/SNAP provide interfaces for Globus to
create advanced reservation
implementations for network QoS available.
setup of a dedicated bandwidth between endpoints

74
Limitations of current Grid RMS

The interaction between local scheduling and
higher-level Grid scheduling is currently a
one-way communication
current local schedulers are not optimized for
Grid-use
limited information available about future job
execution
a site is usually selected by a Grid scheduler
and the job enters the remote queue.
The decision about job placement is inefficient.
Actual job execution is usually not known
Co-allocation is a problem as many systems do not
provide advance reservation

75
Example of Grid Scheduling Decision Making
Where to put the Grid job?
Grid-Scheduler
Grid User
40 jobs running 80 jobs queued
5 jobs running 2 jobs queued
15 jobs running 20 jobs queued
Scheduler
Scheduler
Scheduler
time
time
time
Schedule
Schedule
Schedule
Job-Queue
Job-Queue
Job-Queue
Machine 1
Machine 2
Machine 3
76
Available Information from the Local Schedulers

Decision making is difficult for the Grid
scheduler
limited information about local schedulers is
available
available information may not be reliable
Possible information
queue length, running jobs
detailed information about the queued jobs
execution length, process requirements,
tentative schedule about future job executions
These information are often technically not
provided by the local scheduler
In addition, these information may be subject to
privacy concerns!

77
Consequence

Consider a workflow with 3 short steps (e.g. 1
minute each) that depend on each other
Assume available machines with an average queue
length of 1 hour.
The Grid scheduler can only submit the subsequent
step if the previous job step is finished.
Result
The completion time of the workflow may be larger
than 3 hours(compared to 3 minutes of execution
time)
Current Grids are suitable for simple jobs, but
still quite inefficient in handling more complex
applications
Need for better coordination of higher- and
lower-level scheduling!

78
GRMS in Next Generation Grids

Outlook on future Grid Resource Management and
Scheduling

79
Example Grid Scenario
WAN Transfer
Compute Resources
Remote CenterReads and Generates TB of Data
LAN/WAN Transfer
Assume a data-intensive simulation that should
be visualized and steered during runtime!
Visualization
80
Resource Request of a Simple Grid Job

A specified architecture with
48 processing nodes,
1 GB of available memory, and
a specified licensed software package
for 1 hour between 8am and 6pm of the following
day
Time must be known in advance.
A specific visualization device during program
execution
Minimum bandwidth between the VR device and the
main computer during program execution
Input a specified data set from a data
repository
at most 4
preference of cheaper job execution over an
earlier execution.
actually a pretty simple example (no complex
workflows)

81
Use-Case ExampleCoordinated Simulation and
Visualization
Expected output of a Grid scheduler
Reservations are necessary!
82
Need for a Grid Scheduling Architecture
Grid-User/Application
Grid- Scheduler
Information Services
MonitoringServices
SecurityServices
Accounting/Billing
GridMiddleware
GridMiddleware
GridMiddleware
GridMiddleware
Other Grid Services
Local RMS
Local RMS
Local RMS
Local RMS
OtherResources/Services
83
Required Services/Components

Relevant to Grid scheduling
Information Service
Job/Workflow Description
Requirement Description
Resource Discovery
Reservation
Monitoring/Notification
Job Execution
Security
Accounting/Billing
Data Management
Local RMS

84
Service Oriented Architectures

Services are used to abstract all resources and
functionalities.
Concept of OGSI and WSRF
using WebServices, SOAP, XML to implement the
services
OGSI idea of GridServices is implemented in GT3
transition to WSRF with GT4
Core service for building a Grid are discussed in
the Open Grid Services Architecture (OGSA)

85
Open Grid Services Architecture

Users in Problem Domain X
Applications in Problem Domain X

Application Integration Technology for Problem
Domain X

Generic Virtual Service Access and Integration
Layer

OGSA

OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources

-

Distributed

Virtual Integration Architecture
86
OGSA Outlook
Data Catalog
Data Provision
Virtual Organization
Data Integration
Policy Agreement
Data Access
Context Services
Data Services
Status Monitoring Services
Event
Problem Determination
Information Service
Logging Service
Job Workflow Management
Job Manager
Job Service
Broker
Execution Planning Service
Workflow Manager
Workload Manager
Application Content Manager
Infrastructure Services
WS-RF (OGSI)
Notification
WS Distributed Management
Resource Management Services
Security Services
Provisioning
Deploy Configuration Service
Service Container
Reservation Service
Authentication
Authorization
Delegation
Firewall Transition
87
OGSA Execution Planning
Demand
Supply
Workload Mgmt.
Resource Mgmt. Framework
Environment
Framework
Resource
Mgmt.
CMM
Reservation
Primary Interaction
Factory
User/Job
Job
Policies
Information Provider
Resource
Proxies
Factory
Allocation
Meta
-
Interaction
Resource Provisioning
(or Binding)
Dependency management
Optimizing Framework
Scheduling
Resource Optimizing Framework
Workload Optimizing Framework
Queuing Services
Capacity Management
Workload Optimization
Workload Optimization
Resource Placement
Resource

Workload
Workload Post Balancing
Admission Control (Resources)
Optimal Mapping
Quality of Service (Resources)
Workload Models (History/Prediction)
Resource Selection
Workload Orchestration
Selection Context (e.g. VO)
Admission Control (Workload)
Represents one or more OGSA services
SLA Management (Workload)
88
Functional Requirementsfor Grid Scheduling

Functional Requirements
Cooperation between different resource providers
Interaction with local resource management
systems
Support for reservations and service level
agreements
Orchestration of coordinated resources
allocations
Automatic handling of accounting and billing
Distributed Monitoring
Failure Transparency

89
What are Basic Blocks for a Grid Scheduling
Architecture?
Scheduling-relevant Interfaces of Basic Blocks
are still to be defined!
90
Information ServiceResource Discovery

Relevant for Grid Scheduling
Access to static and dynamic information
Dynamic information include data about planned
or forecasted future events
e.g. existing reservations, scheduled tasks,
future availabilities
need for anonymous and limited information
(privacy concerns)
Information about all resource types
including e.g. data and network
future reservation, data transfers etc.

91
Job/Workflow DescriptionRequirement Description

Information about the job specifics (what is the
job)
and job requirements (what is required for the
job)
including data access and creation
Need for common workflow description
e.g. a DAG formulation
include static and dynamic dependencies
need for the ability to extract workflow
information to schedule a whole workflow in
advance

92
Reservation ManagementAgreement and Negotiation

Interaction between scheduling instances, between
resource/agreement providers and agreement
initiators (higher-level scheduler)
access to tentative information necessary
negotiations might take very long
individual scheduling objectives to be
considered
probably market-oriented and economic scheduling
needed
Need for combining agreements from different
providers
coordinate complex resource requests or
workflows
Maintain different negotiations at the same time
probable several levels of negotiations,
agreement commitment and reservation

93
Accounting and Billing

Interaction to budget information
Charging for allocations, reservations
preliminary allocation of budgets
Concepts for reliable authorization of Grid
schedulers to spend money on behalf of the user
Re-funding in terms of resource/SLA failure,
re-scheduling etc.
Reliable monitoring and accounting
required for tracing whether a party fulfilled an
agreement

94
Monitoring Services

Monitoring of
resource conditions
agreements
schedules
program execution
SLA conformance
workflow
Monitoring must be reliable as it is part of
accountability
fail or fulfillment of a service/resource
provider must be clearly identifiable

95
Conclusions for Grid Scheduling

Grids ultimately require coordinated scheduling
services.
Support for different scheduling instances
different local management systems
different scheduling algorithms/strategies
For arbitrary resources
not only computing resources, also
data, storage, network, software etc.
Support for co-allocation and reservation
necessary for coordinated grid usage (see data,
network, software, storage)
Different scheduling objectives
cost, quality, other

96
Scheduling Model

Using a Brokerage/Trading strategy

Consider individual userpolicies
Higher-levelscheduling
Coordinate Allocations
Submit Grid Job Description
Select Offers
Discover Resources
Collect Offers
Query for Allocation Offers
Generate Allocation Offer
Lower-levelscheduling
Consider individual owner policies
Analyze Query
97
Properties of Multi-Level Scheduling Model

Multi-level scheduling must support different RM
systems and strategies.
Provider can enforce individual policies in
generating resource offers.
User receive resource allocation optimized to the
individual objective
Different higher-level scheduling strategies can
be applied.
Multiple levels of scheduling instances are
possible
Support for fault-tolerant and load-balanced
services.

98
Negotiation in Grids

Multilevel Grid scheduling architecture
Lower level local scheduling instance
Implementation of owner policies
Higher level Grid scheduling instance
Resource selection and coordination
(Static) Interface definition between both
instances
Different types of resources
Different local scheduling systems with different
properties
Different owner policies
(Dynamic) Communication between both instances
Resource discovery
Job monitoring

99
Using Service Level Agreements

The mapping of jobs to resources can be
abstracted using the concept of Service Level
Agreement (SLAs) (Czajkowski, Foster, Kesselman
Tuecke)
SLA Contract negotiated between
resource provider, e.g. local scheduler
resource consumer, e.g., grid scheduler,
application
SLAs provide a uniform approach for the client to
specify resource and QoS requirements, while
hiding from the client details about the
resources,
such as queue names and current workload

100
Service Level Agreement Types

Resource SLA (RSLA)
A promise of resource availability
Client must utilize promise in subsequent SLAs
Advance Reservation is an RSLA
Task SLA (TSLA)
A promise to perform a task
Complex task requirements
May reference an RSLA
Binding SLA (BSLA)
Binds a resource capability to a TSLA
May reference an RSLA (i.e. a reservation)
May be created lazily to provision the task
Allows complex resource arrangements

101
Agreement-Based Negotiation

A client (application) submits a task to a Grid
scheduler
The client negotiates a TSLA for the task with
the Grid Scheduler
In order to provision the TSLA, the Grid
Scheduler may obtain an RSLA with the Grid
resource or may use a pre-existing RSLA that the
Grid scheduler has negotiated speculatively
TSLA that refers to an RSLA assures the jobs gets
the reserved resources at a specified time
TSLA without an RSLA tells little about when the
resources will be available to the job

102
Agreement-Based Negotiation (2)

The job starts execution on the resource
according to the TSLA and the RSLA
For an existing TSLA, the Grid Scheduler may
obtain additional RSLAs
An RSLA is negotiated by the Grid Scheduler with
the Resource
A BSLA binds this RSLA to the corresponding TSLA
BSLAs allow to dynamically provision resources
that are either
not needed for the whole duration of the task or
not known completely (e.g., the time at which a
resource will be needed) before submitting the
task

103
Example of Agreement Mapping

The Grid Scheduler receives requests for two
agreements.
It negotiates with the resources the RSLA1 and
RSLA2, and
in parallel with the agreement initiators about
the corresponding TSLA1 and TSLA2.

User/Application
User/Application
TSLA 1
TSLA 2
Grid Scheduler
RSLA 1
RSLA 2
104
GGF GRAAP-WG

Goal Defining WebService-based protocols for
negotiation and agreement management
WS-Agreement Protocol

105
Towards Grid Scheduling

Grid Scheduling Methods
Support for individual scheduling objectives and
policies
Multi-criteria scheduling models
Economic scheduling methods to Grids
Architectural requirements
Generic job description
Negotiation interface between higher- and
lower-level scheduler
Economic management services
Workflow management
Integration of data and network management

106
Grid Scheduling Strategies

Current approach
Extension of job scheduling for parallel
computers.
Resource discovery and load-distribution to a
remote resource
Usually batch job scheduling model on remote
machine
But actually required for Grid scheduling is
Co-allocation and coordination of different
resource allocations for a Grid job
Instantaneous ad-hoc allocation is not always
suitable
This complex task involves
Cooperation between different resource providers
Interaction with local resource management
systems
Support for reservations and service level
agreements
Orchestration of coordinated resources
allocations

107
User Objective

Local computing typically has
A given scheduling objective as minimization of
response time
Use of batch queuing strategies
Simple scheduling algorithms FCFS, Backfilling
Grid Computing requires
Individual scheduling objective
better resources
faster execution
cheaper execution
More complex objective functions apply for
individual Grid jobs!

108
Provider/Owner Objective

Local computing typically has
Single scheduling objective for the whole system
e.g. minimization of average weighted response
time or high utilization/job throughput
In Grid Computing
Individual policies must be considered
access policy,
priority policy,
accounting policy, and other
More complex objective functions apply for
individual resource allocations!
User and owner policies/objectives may be subject
to privacy considerations!

109
Grid Economics Different Business Models

Cost model
Use of a resource
Reservation of a resource
Individual scheduling objective functions
User and owner objective functions
Formulation of an objective function
Integration of the function in a scheduling
algorithm
Resource selection
The scheduling instances act as broker
Collection and evaluation of resource offers

110
Scheduling Objectives in the Grid

In contrast to local computing, there is no
general scheduling objective anymore
minimizing response time
minimizing cost
tradeoff between quality, cost, response-time
etc.
Cost and different service quality come into play
the user will introduce individual objectives
the Grid can be seen as a market where resource
are concurring alternatives
Similarly, the resource provider has individual
scheduling policies
Problem
the different policies and objectives must be
integrated in the scheduling process
different objectives require different scheduling
strategies
part of the policies may not be suitable for
public exposition(e.g. different pricing or
quality for certain user groups)

111
Grid Scheduling Algorithms

Due to the mentioned requirements in Grids its
not to be expected that a single scheduling
algorithm or strategy is suitable for all
problems.
Therefore, there is need for an infrastructure
that
allows the integration of different scheduling
algorithms
the individual objectives and policies can be
included
resource control stays at the participating
service providers
Transition into a market-oriented Grid scheduling
model

112
Economic Scheduling

Market-oriented approaches are a suitable way to
implement the interaction of different scheduling
layers
agents in the Grid market can implement different
policies and strategies
negotiations and agreements link the different
strategies together
participating sites stay autonomous
Needs for suitable scheduling algorithms and
strategies for creating and selecting offers
need for creating the Pareto-Optimal scheduling
solutions
Performance relies highly on the available
information
negotiation can be hard task if many potential
providers are available.

113
Economic Scheduling (2)

Several possibilities for market models
auctions of resources/services
auctions of jobs
Offer-request mechanisms support
inclusion of different cost models, price
determination
individual objective/utility functions for
optimization goals
Market-oriented algorithms are considered
robust
flexible in case of errors
simple to adapt
markets can have unforeseeable dynamics

114
Problem Offer Creation
Job
t
t0
R1 R2 R3 R4 R5 R6 R7 R8
115
Offer Creation (2)
Offer 1
t
R1 R2 R3 R4 R5 R6 R7 R8
116
Offer Creation (3)
Offer 2
t
R1 R2 R3 R4 R5 R6 R7 R8
117
Offer Creation (4)
Offer 3
t
R1 R2 R3 R4 R5 R6 R7 R8
118
Evaluate Offers

Evaluation with utility functions

A utility function is a mathematical
representation of a users preference
The utility function may be complex and
contain several different criteria
Example using response time (or delay time) and
price

119
Optimization Space
Improved Utility
latency

Write a Comment

User Comments (0)

About PowerShow.com

Technologies for Grids and eBusiness Grid Resource Management 22'01'08 PowerPoint PPT Presentation