Technologies for Grids and eBusiness Grid Resource Management 22'01'08 PowerPoint PPT Presentation

presentation player overlay
1 / 151
About This Presentation
Transcript and Presenter's Notes

Title: Technologies for Grids and eBusiness Grid Resource Management 22'01'08


1
Technologies for Grids and eBusinessGrid
Resource Management22.01.08
  • Dr. Ramin YahyapourComputer Engineering
    InstituteUniversity Dortmund

2
Resource Management on HPC Resources
  • HPC resources are usually parallel computers or
    large scale clusters
  • The local resource management systems (RMS) for
    such resources includes
  • configuration management
  • monitoring of machine state
  • job management
  • There is no standard for this resource
    management.
  • Several different proprietary solutions are in
    use.
  • Examples for job management systems
  • PBS, LSF, NQS, LoadLeveler, Condor

3
HPC Management Architecture in General
Control ServiceJob Master
MasterServer
Resource and JobMonitoring and Management
Services
Compute Resources/Processing Nodes
4
Computational Job
  • A job is a computational task
  • that requires processing capabilities (e.g. 64
    nodes) and
  • is subject to constraints (e.g. a specific other
    job must finish before the start of this job)
  • The job information is provided by the user
  • resource requirements
  • CPU architecture, number of nodes, speed
  • memory size per CPU
  • software libraries, licenses
  • I/O capabilities
  • job description
  • additional constraints and preferences
  • The format of job description is not
    standardized, but usually very similar

5
Example PBS Job Description
  • Simple job script

whole job file is a shell script
!/bin/csh resource limits allocate needed
nodes PBS -l nodes1 resource limits
amount of memory and CPU time (hms). PBS
-l mem256mb PBS -l cput20000
path/filename for standard output PBS -o
master/mypath/myjob.out ./my-task
information for the RMS are comments
the actual job is started in the script
6
Job Submission
  • The user submits the job to the RMSe.g.
    issuing qsub jobscript.pbs
  • The user can control the job
  • qsub submit
  • qstat poll status information
  • qdel cancel job
  • It is the task of the resource management system
    to start a job on the required resources
  • Current system state is taken into account

7
PBS Structure
qsub jobscript
Job Submission
Management Server
Scheduler
8
Execution Alternatives
  • Time sharing
  • The local scheduler starts multiple processes per
    physical CPU with the goal of increasing resource
    utilization.
  • multi-tasking
  • The scheduler may also suspend jobs to keep the
    system load under control
  • preemption
  • Space sharing
  • The job uses the requested resources exclusively
    no other job is allocated to the same set of
    CPUs.
  • The job has to be queued until sufficient
    resources are free.

9
Job Classifications
  • Batch Jobs vs interactive jobs
  • batch jobs are queued until execution
  • interactive jobs need immediate resource
    allocation
  • Parallel vs. sequential jobs
  • a job requires several processing nodes in
    parallel
  • the majority of HPC installations are used to run
    batch jobs in space-sharing mode!
  • a job is not influenced by other co-allocated
    jobs
  • the assigned processors, node memory, caches etc.
    are exclusively available for a single job.
  • overhead for context switches is minimized
  • important aspects for parallel applications

10
Preemption
  • A job is preempted by interrupting its current
    execution
  • the job might be on hold on a CPU set and later
    resumed job still resident on that nodes
    (consumption of memory)
  • alternatively a checkpoint is written and the job
    is migrated to another resource where it is
    restarted later
  • Preemption can be useful to reallocate resources
    due to new job submissions (e.g. with higher
    priority)
  • or if a job is running longer then expected.

11
Job Scheduling
  • A job is assigned to resources through a
    scheduling process
  • responsible for identifying available resources
  • matching job requirements to resources
  • making decision about job ordering and priorities
  • HPC resources are typically subject to high
    utilization
  • therefore, resources are not immediately
    available and jobs are queued for future
    execution
  • time until execution is often quite long (many
    production systems have an average delay until
    execution of gt1h)
  • jobs may run for a long time (several hours, days
    or weeks)

12
Typical Scheduling Objectives
  • Minimizing the Average Weighted Response Time
  • Maximize machine utilization/minimize idle time
  • conflicting objective
  • criteria is usually static for an installation
    and implicit given by the scheduling algorithm
  • r submission time of a job
  • t completion time of a job
  • w weight/priority of a job

13
Job Steps
Scheduler
  • A user job enters a job queue,
  • the scheduler (its strategy) decides on start
    time and resource allocation of the job.

time
Job ExecutionManagement
Schedule
lokaleJob-Queue
Grid-User
Job Description
Node Job Mgmt
Node Job Mgmt
Node Job Mgmt
HPC Machine
14
Scheduling AlgorithmsFCFS
  • Well known and very simple First-Come
    First-Serve
  • Jobs are started in order of submission
  • Ad-hoc scheduling when resources become free
    again
  • no advance scheduling
  • Advantage
  • simple to implement
  • easy to understand and fair for the users (job
    queue represents execution order)
  • does not require a priori knowledge about job
    lengths
  • Problems
  • performance can extremely degrade overall
    utilization of a machine can suffer if highly
    parallel jobs occur, that is, if a significant
    share of nodes is requested for a single job.

15
FCFS Schedule
Scheduler
Queue
time
Time
Schedule
Job-Queue
Compute Resource
ResourcesProcssing Nodes
16
Scheduling AlgorithmsBackfilling
  • Improvement over FCFS
  • A job can be started before an earlier submitted
    job if it does not delay the first job in the
    queue
  • may still cause delay of other jobs further down
    the queue
  • Some fairness is still maintained
  • Advantage
  • utilization is improved
  • Information about the job execution length is
    needed
  • sometimes difficult to provide
  • user estimation not necessarily accurate
  • Jobs are usually terminated after exceeding its
    allocated execution time
  • otherwise users may deliberately underestimate
    the job length to get an earlier job start time

17
Backfill Scheduling
  • Job 3 is started before Job 2 as it does not
    delay it

Scheduler
Queue
1.
time
Schedule
Time
2.
Job-Queue
3.
Compute Resource
4
ResourcesProcssing Nodes
18
Backfill Scheduling
  • However, if a job finishes earlier than expected,
    the backfilling causes delays that otherwise
    would not occur
  • need for accurate job length information
    (difficult to obtain)

Scheduler
Queue
1.
time
Job finishes earlier!
Schedule
Time
2.
Job-Queue
3.
Compute Resource
4
ResourcesProcssing Nodes
19
Job Execution Manager
  • After the scheduling process,the RMS is
    responsible for the job execution
  • sets up the execution environment for a job,
  • starts a job,
  • monitors job state, and
  • cleans-up after execution (copying output-files
    etc.)
  • notifies the used (e.g. sending email)

20
Scheduling Options
  • Parallel job scheduling algorithms are well
    studied performance is usually acceptable
  • Real implementations may have addition
    requirements instead of need of more complex
    theoretical algorithms
  • Prioritization of jobs, users, or groups while
    maintaining fairness
  • Partitioning of machines
  • e.g. interactive and development partition vs.
    production batch partitions
  • Combination of different queue characteristics
  • For instance, the Maui Scheduler is often
    deployed as it is quite flexible in terms of
    prioritization, backfilling, fairness etc.

21
Transition to Grid Resource Management and
Scheduling
  • Current state of the art

22
Transition to the Grid
  • More resource types come into play
  • Resources are any kind of entity, service or
    capability to perform a specific task
  • processing nodes, memory, storage, networks,
    experimental devices, instruments
  • data, software, licenses
  • people
  • The task/job/activity can also be of a broader
    meaning
  • a job may involve different resources and
    consists of several activities in a workflow with
    according dependencies
  • The resources are distributed and may belong to
    different administrative domains
  • HPC is still key the application for Grids.
    Consequently, the main resources in a Grid are
    the previously considered HPC machines with their
    local RMS

23
Implications to Grid Resource Management
  • Several security-related issues have to be
    considered authentication, authorization,accounti
    ng
  • who has access to a certain resource?
  • what information can be exposed to whom?
  • There is lack of global information
  • what resources are when available for an
    activity?
  • The resources are quite heterogeneous
  • different RMS in use
  • individual access and usage paradigms
  • administrative policies have to be considered

24
Scope of Grids
Cluster Grid Enterprise Grid
Global Grid
Source Ian Foster
25
Resource Management Layer
  • Grid Resource Management System consists of
  • Local resource management system (Resource Layer)
  • Basic resource management unit
  • Provide a standard interface for using remote
    resources
  • e.g. GRAM, etc.
  • Global resource management system (Collective
    Layer)
  • Coordinate all Local resource management system
    within multiple or distributed Virtual
    Organizations (VOs)
  • Provide high-level functionalities to efficiently
    use all of resources
  • Job Submission
  • Resource Discovery and Selection
  • Scheduling
  • Co-allocation
  • Job Monitoring, etc.
  • e.g. Meta-scheduler, Resource Broker, etc.

26
Grid Middleware
Source Ian Foster
27
Grid Middleware (2)
Higher-Level Services
User/Application
Grid Middleware
ResourceBroker
28
Globus Grid Middleware
  • Globus Toolkit
  • common source for Grid middleware
  • GT2
  • GT3 Web/GridService-based
  • GT4 WSRF-based
  • GRAM is responsible for providing a service for a
    given job specification that can
  • Create an environment for a job
  • Stage files to/from the environment
  • Submit a job to a local scheduler
  • Monitor a job
  • Send job state change notifications
  • Stream a jobs stdout/err during execution

29
Globus Job Execution
  • Job is described in the resource specification
    language
  • Discover a Job Service for execution
  • Job Manager in Globus 2.x (GT2)
  • Master Management Job Factory Service (MMJFS) in
    Globus 3.x (GT3)
  • Alternatively, choose a Grid Scheduler for job
    distribution
  • Grid scheduler selects a job service and forwards
    job to it
  • A Grid scheduler is not part of Globus
  • The Job Service prepares job for submission to
    local scheduling system
  • If necessary, file stage-in is performed
  • e.g. using the GASS service
  • The job is submitted to the local scheduling
    system
  • If necessary, file stage-out is performed after
    job finishes.

30
Globus GT2 Execution
RSL
User/Application
Resource Broker
RSL
Specialized RSL
Resource Allocation
MDS
31
RSL
  • Grid jobs are described in the resource
    specification language (RSL)
  • RSL Version 1 is used in GT2
  • It has an LDAP filter-like syntax that supports
    boolean expressions
  • Example

(executable a.out) (directory
/home/nobody )(arguments arg1 "arg 2")(count
1)
32
Globus Job States
suspended
pending
stage-in
stage-out
active
done
failed
33
Globus GT3
  • With transition to Web/Grid-Services, the job
    management becomes
  • the Master Managed Job Factory Service (MMJFS)
  • the Managed Job Factory Service (MJFS)
  • the Managed Job Service (MJS)
  • The client contacts the MMJFS
  • MMJFS informs the MJFS to create a MJS for the
    job
  • The MJS takes care of managing the job actions.
  • interact with local scheduler
  • file staging
  • store job status

34
Globus GT3 Job Execution
User/Application
Master ManagedJob Factory Service
Managed Factory Job Service
File Streaming Factor Service
Resource InformationProvider Service
ManagedJob Service
File Streaming Service
Local Scheduler
  • Globus as a toolkit does not perform scheduling
    and automaticresource selection

35
Example Extending the Globus Architecture at
KAIST
Job Submission Service
Client
1
Resource Selection Service
Resource Information Service
Scheduling Service
2
3
4
Job Monitoring Service
9
5
7
Local Resource Monitoring Service (RIPS)
Job Manger Service (MJS)
Resource Reservation Service
8
6
Providers
Local Resource Manager (PBS)
Resource Preference Provider
Workflow
Monitoring Information Flow
Source Jin-Soo Kim
36
Job Description with RSL2
  • The version 2 of RSL is XML-based
  • Two namespaces are used
  • rsl for basic types as int, string, path, url
  • gram for the elements of a job

GNS http//www.globus.org/namespaces lt?xml
version"1.0" encoding"UTF-8"?gt ltrslrsl
xmlnsrsl"GNS/2003/04/rsl"
xmlnsgram"GNS/2003/04/rsl/gram"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"
GNS/2003/04/rsl ./schema/base/gram/rsl.
xsd GNS/2003/04/rsl/gram
./schema/base/gram/gram_rsl.xsd"gtltgramjobgt
ltgramexecutablegtltrslpathgt
ltrslstringElement value"/bin/a.out"/gt
lt/rslpathgtlt/gramexecutablegt
lt/gramjobgt lt/rslrslgt
37
RSL2 Attributes
  • ltcountgt (type rslintegerType)
  • Number of processes to run (default is 1)
  • lthostCountgt (type rslintegerType)
  • On SMP multi-computers, number of nodes to
    distribute the count processes across
  • count/hostCount number of processes per host
  • ltqueuegt (type rslstringType)
  • Queue into which to submit job
  • ltmaxWallTimegt (type rsllongType)
  • Maximum wall clock runtime in minutes
  • ltmaxCpuTimegt (type rsllongType)
  • Maximum CPU runtime in minutes
  • ltmaxTimegt (type rsllongType)
  • Only applies if above are not used
  • Maximum wall clock or cpu runtime (schedulerss
    choice) in minutes

38
Job Submission Tools
  • GT 3 provides the Java class GramClient
  • GT 2.x command line programs for job submission
  • globus-job-run interactive jobs
  • globus-job-submit batch jobs
  • globusrun takes RSL as input

39
Globus 2 Job Client Interface
A simple job submission requiring 2 nodes
globus-job-run np 2 s myprog arg1 arg2
  • A multirequest specifies multiple resources for a
    job
  • globus-job-run -dumprsl - host1 /bin/uname -a
    \- host2 /bin/uname a ( (resourceManagerCont
    act"host1")(subjobStartTypestrict-barrier)
    (label"subjob 0")(executable"/bin/uname")
    (arguments "-a") )( (resourceManagerContact"ho
    st2")(subjobStartTypestrict-barrier)(label"subj
    ob 1")(executable"/bin/uname") (arguments
    "-a") )

40
Globus 2 Job Client Interface
  • The full flexibility of RSL is available through
    the command line tool globusrun
  • Support for file staging executable and
    stdin/stdout
  • Example

globusrun -o r hpc1.acme.com/jobmanager-pbs '(ex
ecutable(HOME)/a.out) (jobtypesingle) (queueti
me-shared)
41
Problem Job Submission Descriptions differ
  • The deliverables of the GGF Working Group JSDL
  • A specification for an abstract standard Job
    Submission Description Language (JSDL) that is
    independent of language bindings, including
  • the JSDL feature set and attribute semantics,
  • the definition of the relationship between
    attributes,
  • and the range of attribute values.
  • A normative XML Schema corresponding to the JSDL
    specification.
  • A document of translation tables to and from the
    scheduling languages of a set of popular batch
    systems for both the job requirements and
    resource description attributes of those
    languages, which are relevant to the JSDL.

42
JSDL Attribute Categories
  • The job attribute categories will include
  • Job Identity Attributes
  • ID, owner, group, project, type, etc.
  • Job Resource Attributes
  • hardware, software, including applications, Web
    and Grid Services, etc.
  • Job Environment Attributes
  • environment variables, argument lists, etc.
  • Job Data Attributes
  • databases, files, data formats, and staging,
    replication, caching, and disk requirements, etc.
  • Job Scheduling Attributes
  • start and end times, duration, immediate
    dependencies etc.
  • Job Security Attributes
  • authentication, authorisation, data encryption,
    etc.

43
Problem Resource Management Systems Differ
Across Each Component
Source Hrabri Rajic
44
GGF-WG DRMAA
  • GGF Working Group Distributed Resource
    Management Application API
  • From the charter
  • Develop an API specification for the submission
    and control of jobs to one or more Distributed
    Resource Management (DRM) systems.
  • The scope of this specification is all the high
    level functionality which is necessary for an
    application to consign a job to a DRM system
    including common operations on jobs like
    termination or suspension.
  • The objective is to facilitate the direct
    interfacing of applications to today's DRM
    systems by application's builders, portal
    builders, and Independent Software Vendors (ISVs).

45
DRMAA State Diagram
  • The remote job could be in following states
  • system hold
  • user hold
  • system and user hold simultaneously
  • queued active
  • system suspended
  • user suspended
  • system and user suspended simultaneously
  • running
  • finished (un)successfully

Source Hrabri Rajic
46
Example Condor-G
  • Condor-G is a Condor system enhanced to manage
    Globus jobs.
  • It provides two main features
  • Globus Universe interface for submitting,
    queuing and monitoring jobs that use Globus
    resources
  • GlideInsystem for efficient execution of jobs
    on remote Globus resources
  • Condor-G runs as a personal Condor system
  • daemons run as non-privileged user processes
  • each user runs her/his Condor-G

47
Condor-G GlideIn
  • Globus is used to run the Condor daemons on Grid
    resources
  • Condor daemons run as a Globusmanaged job
  • GRAM service starts daemons rather than the
    Condor jobs
  • When the resources run these GlideIn jobs, they
    will join the personal Condor pool
  • These daemons can be used to launch a job from
    Condor-G to a Globus resource
  • Jobs are submitted as Condor jobs and they will
    be matched and run on the Grid resources
  • the daemons receive jobs from the users Condor
    queue
  • combines the benefits of Globus and Condor

48
Using GlideIn
Condor-G
Grid Resource
JobManager
Schedd
LSF
GridManager
Startd
Collector
Source ANL/USC ISI
49
Example DAGMan
  • Directed Acyclic Graph Manager
  • DAGMan allows you to specify the dependencies
    between your Condor-G jobs, so it can manage them
    automatically for you.
  • (e.g., Dont run job B until job A has
    completed successfully.)
  • A DAG is defined by a .dag file, listing each of
    its nodes and their dependencies
  • diamond.dag
  • Job A a.sub
  • Job B b.sub
  • Job C c.sub
  • Job D d.sub
  • Parent A Child B C
  • Parent B C Child D
  • each node will run the Condor-G job specified by
    its accompanying Condor submit file

Source Miron Livny
50
Grid Scheduling
  • How to select resources in the Grid?

51
Different Level of Scheduling
  • Resource-level scheduler
  • low-level scheduler, local scheduler, local
    resource manager
  • scheduler close to the resource, controlling a
    supercomputer, cluster, or network of
    workstations, on the same local area network
  • Examples Open PBS, PBS Pro, LSF, SGE
  • Enterprise-level scheduler
  • Scheduling across multiple local schedulers
    belonging to the same organization
  • Examples PBS Pro peer scheduling, LSF
    Multicluster
  • Grid-level scheduler
  • also known as super-scheduler, broker, community
    scheduler
  • Discovers resources that can meet a jobs
    requirements
  • Schedules across lower level schedulers

52
Grid-Level Scheduler
  • Discovers selects the appropriate resource(s)
    for a job
  • If selected resources are under the control of
    several local schedulers, a meta-scheduling
    action is performed
  • Architecture
  • Centralized all lower level schedulers are under
    the control of a single Grid scheduler
  • not realistic in global Grids
  • Distributed lower level schedulers are under the
    control of several grid scheduler components a
    local scheduler may receive jobs from several
    components of the grid scheduler

53
Grid Scheduling
Grid-Scheduler
Grid User
Scheduler
Scheduler
Scheduler
time
time
time
Schedule
Schedule
Schedule
Job-Queue
Job-Queue
Job-Queue
Machine 1
Machine 2
Machine 3
54
Activities of a Grid Scheduler
  • GGF Document 10 Actions of Super Scheduling
    (GFD-I.4)

Source Jennifer Schopf
55
Grid Scheduling
  • A Grid scheduler allows the user to specify the
    required resources and environment of the job
    without having to indicate the exact location of
    the resources
  • A Grid scheduler answers the question to which
    local resource manger(s) should this job be
    submitted?
  • Answering this question is hard
  • resources may dynamically join and leave a
    computational grid
  • not all currently unused resources are available
    to grid jobs
  • resource owner policies such as maximum number
    of grid jobs allowed
  • it is hard to predict how long jobs will wait in
    a queue

56
Select a Resource for Execution
  • Most systems do not provide advance information
    about future job execution
  • user information not accurate as mentioned before
  • new jobs arrive that may surpass current queue
    entries due to higher priority
  • Grid scheduler might consider current queue
    situation, however this does not give reliable
    information for future executions
  • A job may wait long in a short queue while it
    would have been executed earlier on another
    system.
  • Available information
  • Grid information service gives the state of the
    resources and possibly authorization information
  • Prediction heuristics estimate jobs wait time
    for a given resource, based on the current state
    and the jobs requirements.

57
Selection Criteria
  • Distribute jobs in order to balance load across
    resources
  • not suitable for large scale grids with different
    providers
  • Data affinity run job on the resource where data
    is located
  • Use heuristics to estimate job execution time.
  • Best-fit select the set of resources with the
    smallest capabilities and capacities that can
    meet jobs requirements
  • Quality of Service of
  • a resource or
  • its local resource management system
  • what features has the local RMS?
  • can they be controlled from the Grid scheduler?

58
Scheduling Attributes
  • Working Group in the Global Grid Forum to
  • Define the attributes of a lower-level
    scheduling instance that can be exploited by a
    higher-level scheduling instance.
  • The following attributes have been defined
  • Attributes of allocation properties
  • Revocation of an allocation
  • The local scheduler reserves the right to
    withdraw a given allocation.
  • Guaranteed completion time of allocation,
  • A deadline for job completion is provided by the
    local scheduler.
  • Guaranteed Number of Attempts to Complete a Job
  • A local scheduler will retry a given job task,
    e.g. useful for data transfer actions.
  • Allocations run-to-completion
  • A job is not preempted if it has been started
  • Exclusive Allocations
  • A job has exclusive access to the given
    resources e.g. no time-sharing is performed
  • Malleable Allocations
  • The given resource set may change during runtime
    e.g. a computational job will gain (moldable job)
    or lose processors

59
Scheduling Attributes (2)
  • Attributes of available information
  • Access to tentative schedule
  • The local scheduler exposes his schedule of
    future allocations
  • Option Only the projected start time of a
    specified allocation is available
  • Option Only partial information on the current
    schedule is available
  • Exclusive control
  • The local scheduler is exclusively in charge of
    the resources no other jobs can appear on the
    resources
  • Event Notification
  • The local scheduler provides an event
    subscription service
  • Attributes for manipulating allocation execution
  • Preemption
  • Checkpointing
  • Migration
  • Restart

60
Scheduling Attributes (3)
  • Attributes for requesting resources
  • Allocation Offers
  • The local system can provides an interface to
    request offers for an allocation
  • Allocation Cost or Objective Information
  • The local scheduler can provide cost or objective
    information
  • Advance Reservation
  • Allocations can be reserved in advance
  • Requirement for Providing Maximum Allocation
    Length in Advance
  • The higher-level scheduler must provide a maximum
    job execution length
  • Deallocation Policy
  • A policy applies for the allocation that must be
    met to stay valid
  • Remote Co-Scheduling
  • A schedule can be generated by a higher-level
    instance and imposed on the local scheduler
  • Consideration of Job Dependencies
  • The local scheduler can deal with dependency
    information of jobs e.g. for workflows

61
CSF Community Scheduler Framework
  • An open source implementation of OGSA-based
    metascheduler for VOs.
  • Supports emerging WS-Agreement spec
  • Supports GT GRAM
  • Fills in gaps in existing resource management
    picture
  • Contributed from platform to the Globus Toolkit
  • Extensible, Open Source framework for
    implementing meta-schedulers
  • Provides basic protocols and interfaces to help
    resources work together in heterogeneous
    environments

62
CSF Architecture
Platform LSF User
Globus Toolkit User
Meta- scheduler Plugin
LSF
Grid Service Hosting Environment
Meta-Scheduler
Global Information Service
Job Service
Reservation Service
Queuing Service
GRAM SGE
GRAM PBS
RM Adapter
RIPS
RIPS
RIPS
RIPS Resource Information Provider Service
Platform LSF
SGE
PBS
Source Chris Smith
63
Global Information Service
Rsrc Info Req
ClusterInfo Req
Data Store Req
Data Load Req
Global Information Service
Index Service
Registry
SD Aggregator
SDProvider Manager
Data Storage
Rsrc, job rsv info
Data Store Load
Job Info
Rsv Info
Cluster Register
RIPS
DB
RIPS
Source Chris Smith
64
Support for Virtual Organizations
Source Chris Smith
65
CSF Grid Services
  • Job Service
  • creates, monitors and controls compute jobs
  • Reservation Service
  • guarantees resources are available for running a
    job
  • Queuing Service
  • provides a service where administrators can
    customize and define scheduling policies at the
    VO level and/or at the different resource manager
    level
  • Defines an API for plug in schedulers
  • RM Adaptor Service
  • provides a Grid service interface that bridges
    the Grid service protocol and resource managers
    (LSF, PBS, SGE, Condor and other RMs)

66
GT3 Job Submission / Architecture
MMJFS Master Managed Job Factory Service MJS
Managed Job Service Blue indicates a Grid Service
hosted in a GT3 container
MJS for LSF
LSF
MMJFS
RIPS
Site A MMJFS on node1
managed-job-globusrun
MJS for PBS
managed-job-globusrun
PBS
MMJFS
managed-job-globusrun
RIPS
Site B MMJFS on node2
MJS for SGE
Index Service
SGE
MMJFS
RIPS
Site C MMJFS on node3
Source Chris Smith
67
GT3 CSF Architecture
Queuing Service
Job Service
Reservation Service
Index Service
Virtual Organization
Source Chris Smith
68
Queue Service
  • In CSF, Job Service instances are submitted to
    the Queue Service for dispatch to a resource
    manager.
  • The Queue Service provides a plug in API for
    extending the scheduling algorithms provided by
    default with CSF.
  • The Queue Service is responsible for
  • loads and validates configuration information
  • loads all configured scheduler plugins
  • calls the plugin API functions
  • schedInit() after loading the plugin successfully
  • schedOrder() when a new job is submitted
  • schedMatch() during the scheduling cycle
  • schedPost() before the scheduling cycle ends, and
    after scheduling decisions are sent to the job
    service instances

69
Example Project GridLab - GRMS
Information Services
Data Management
Authorization System
Adaptive
Resource Discovery
File Transfer Unit
Jobs Queue
BROKER
Job Receiver
Execution Unit
Monitoring
SLA Negotiation
Scheduler
Workflow Manager
Application Manager
Resource Reservation
Prediction Unit
GRMS
GLOBUS, other
Local Resources (Managers)
Source Jarek Nabrzyski
70
Anticipated Features
  • Reliable and predictable delivery of a service
  • Quality of service for a job service
  • Reliable job submission two-phase commit
  • Predictable start and end time of the job
  • Advance reservation assures start time and
    throughput
  • Fault tolerance/recovery
  • Migrate job to another resource before the fault
    occurs the job continues
  • after the fault the job is restarted
  • Rerun the job on the same resource after repair
  • Allocate multiple resources for a job

71
Co-allocation
  • It is often requested that several resources are
    used for a single job.
  • that is, a scheduler has to assure that all
    resources are available when needed.
  • in parallel (e.g. visualization and processing)
  • with time dependencies (e.g. a workflow)
  • The task is especially difficult if the resources
    belong to different administrative domains.
  • The actual allocation time must be known for
    co-allocation
  • or the different local resource management
    systems must synchronize each other (wait for
    availability of all resources)

72
Example Multi-Site Job Execution
Grid-Scheduler
Multi-Side Job
  • A job uses several resources at different sites
    in parallel.
  • Network communication is an issue.

73
Advanced Reservation
  • Co-allocation and other applications require a
    priori information about the precise resource
    availability
  • With the concept of advanced reservation, the
    resource provider guarantees a specified resource
    allocation.
  • includes a two- or three-phase commit for
    agreeing on the reservation
  • Implementations
  • GARA/DUROC/SNAP provide interfaces for Globus to
    create advanced reservation
  • implementations for network QoS available.
  • setup of a dedicated bandwidth between endpoints

74
Limitations of current Grid RMS
  • The interaction between local scheduling and
    higher-level Grid scheduling is currently a
    one-way communication
  • current local schedulers are not optimized for
    Grid-use
  • limited information available about future job
    execution
  • a site is usually selected by a Grid scheduler
    and the job enters the remote queue.
  • The decision about job placement is inefficient.
  • Actual job execution is usually not known
  • Co-allocation is a problem as many systems do not
    provide advance reservation

75
Example of Grid Scheduling Decision Making
Where to put the Grid job?
Grid-Scheduler
Grid User
40 jobs running 80 jobs queued
5 jobs running 2 jobs queued
15 jobs running 20 jobs queued
Scheduler
Scheduler
Scheduler
time
time
time
Schedule
Schedule
Schedule
Job-Queue
Job-Queue
Job-Queue
Machine 1
Machine 2
Machine 3
76
Available Information from the Local Schedulers
  • Decision making is difficult for the Grid
    scheduler
  • limited information about local schedulers is
    available
  • available information may not be reliable
  • Possible information
  • queue length, running jobs
  • detailed information about the queued jobs
  • execution length, process requirements,
  • tentative schedule about future job executions
  • These information are often technically not
    provided by the local scheduler
  • In addition, these information may be subject to
    privacy concerns!

77
Consequence
  • Consider a workflow with 3 short steps (e.g. 1
    minute each) that depend on each other
  • Assume available machines with an average queue
    length of 1 hour.
  • The Grid scheduler can only submit the subsequent
    step if the previous job step is finished.
  • Result
  • The completion time of the workflow may be larger
    than 3 hours(compared to 3 minutes of execution
    time)
  • Current Grids are suitable for simple jobs, but
    still quite inefficient in handling more complex
    applications
  • Need for better coordination of higher- and
    lower-level scheduling!

78
GRMS in Next Generation Grids
  • Outlook on future Grid Resource Management and
    Scheduling

79
Example Grid Scenario
WAN Transfer
Compute Resources
Remote CenterReads and Generates TB of Data
LAN/WAN Transfer
Assume a data-intensive simulation that should
be visualized and steered during runtime!
Visualization
80
Resource Request of a Simple Grid Job
  • A specified architecture with
  • 48 processing nodes,
  • 1 GB of available memory, and
  • a specified licensed software package
  • for 1 hour between 8am and 6pm of the following
    day
  • Time must be known in advance.
  • A specific visualization device during program
    execution
  • Minimum bandwidth between the VR device and the
    main computer during program execution
  • Input a specified data set from a data
    repository
  • at most 4
  • preference of cheaper job execution over an
    earlier execution.
  • actually a pretty simple example (no complex
    workflows)

81
Use-Case ExampleCoordinated Simulation and
Visualization
Expected output of a Grid scheduler
Reservations are necessary!
82
Need for a Grid Scheduling Architecture
Grid-User/Application
Grid- Scheduler
Information Services
MonitoringServices
SecurityServices
Accounting/Billing
GridMiddleware
GridMiddleware
GridMiddleware
GridMiddleware
Other Grid Services
Local RMS
Local RMS
Local RMS
Local RMS
OtherResources/Services
83
Required Services/Components
  • Relevant to Grid scheduling
  • Information Service
  • Job/Workflow Description
  • Requirement Description
  • Resource Discovery
  • Reservation
  • Monitoring/Notification
  • Job Execution
  • Security
  • Accounting/Billing
  • Data Management
  • Local RMS

84
Service Oriented Architectures
  • Services are used to abstract all resources and
    functionalities.
  • Concept of OGSI and WSRF
  • using WebServices, SOAP, XML to implement the
    services
  • OGSI idea of GridServices is implemented in GT3
  • transition to WSRF with GT4
  • Core service for building a Grid are discussed in
    the Open Grid Services Architecture (OGSA)

85
Open Grid Services Architecture

Users in Problem Domain X
Applications in Problem Domain X

Application Integration Technology for Problem
Domain X

Generic Virtual Service Access and Integration
Layer

OGSA










OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources




-

Distributed

Virtual Integration Architecture
86
OGSA Outlook
Data Catalog
Data Provision
Virtual Organization
Data Integration
Policy Agreement
Data Access
Context Services
Data Services
Status Monitoring Services
Event
Problem Determination
Information Service
Logging Service
Job Workflow Management
Job Manager
Job Service
Broker
Execution Planning Service
Workflow Manager
Workload Manager
Application Content Manager
Infrastructure Services
WS-RF (OGSI)
Notification
WS Distributed Management
Resource Management Services
Security Services
Provisioning
Deploy Configuration Service
Service Container
Reservation Service
Authentication
Authorization
Delegation
Firewall Transition
87
OGSA Execution Planning
Demand
Supply
Workload Mgmt.
Resource Mgmt. Framework
Environment
Framework
Resource
Mgmt.
CMM
Reservation
Primary Interaction
Factory
User/Job
Job
Policies
Information Provider
Resource
Proxies
Factory
Allocation
Meta
-
Interaction
Resource Provisioning
(or Binding)
Dependency management
Optimizing Framework
Scheduling
Resource Optimizing Framework
Workload Optimizing Framework
Queuing Services
Capacity Management
Workload Optimization
Workload Optimization
Resource Placement
Resource

Workload
Workload Post Balancing
Admission Control (Resources)
Optimal Mapping
Quality of Service (Resources)
Workload Models (History/Prediction)
Resource Selection
Workload Orchestration
Selection Context (e.g. VO)
Admission Control (Workload)
Represents one or more OGSA services
SLA Management (Workload)
88
Functional Requirementsfor Grid Scheduling
  • Functional Requirements
  • Cooperation between different resource providers
  • Interaction with local resource management
    systems
  • Support for reservations and service level
    agreements
  • Orchestration of coordinated resources
    allocations
  • Automatic handling of accounting and billing
  • Distributed Monitoring
  • Failure Transparency

89
What are Basic Blocks for a Grid Scheduling
Architecture?
Scheduling-relevant Interfaces of Basic Blocks
are still to be defined!
90
Information ServiceResource Discovery
  • Relevant for Grid Scheduling
  • Access to static and dynamic information
  • Dynamic information include data about planned
    or forecasted future events
  • e.g. existing reservations, scheduled tasks,
    future availabilities
  • need for anonymous and limited information
    (privacy concerns)
  • Information about all resource types
  • including e.g. data and network
  • future reservation, data transfers etc.

91
Job/Workflow DescriptionRequirement Description
  • Information about the job specifics (what is the
    job)
  • and job requirements (what is required for the
    job)
  • including data access and creation
  • Need for common workflow description
  • e.g. a DAG formulation
  • include static and dynamic dependencies
  • need for the ability to extract workflow
    information to schedule a whole workflow in
    advance

92
Reservation ManagementAgreement and Negotiation
  • Interaction between scheduling instances, between
    resource/agreement providers and agreement
    initiators (higher-level scheduler)
  • access to tentative information necessary
  • negotiations might take very long
  • individual scheduling objectives to be
    considered
  • probably market-oriented and economic scheduling
    needed
  • Need for combining agreements from different
    providers
  • coordinate complex resource requests or
    workflows
  • Maintain different negotiations at the same time
  • probable several levels of negotiations,
    agreement commitment and reservation

93
Accounting and Billing
  • Interaction to budget information
  • Charging for allocations, reservations
    preliminary allocation of budgets
  • Concepts for reliable authorization of Grid
    schedulers to spend money on behalf of the user
  • Re-funding in terms of resource/SLA failure,
    re-scheduling etc.
  • Reliable monitoring and accounting
  • required for tracing whether a party fulfilled an
    agreement

94
Monitoring Services
  • Monitoring of
  • resource conditions
  • agreements
  • schedules
  • program execution
  • SLA conformance
  • workflow
  • Monitoring must be reliable as it is part of
    accountability
  • fail or fulfillment of a service/resource
    provider must be clearly identifiable

95
Conclusions for Grid Scheduling
  • Grids ultimately require coordinated scheduling
    services.
  • Support for different scheduling instances
  • different local management systems
  • different scheduling algorithms/strategies
  • For arbitrary resources
  • not only computing resources, also
  • data, storage, network, software etc.
  • Support for co-allocation and reservation
  • necessary for coordinated grid usage (see data,
    network, software, storage)
  • Different scheduling objectives
  • cost, quality, other

96
Scheduling Model
  • Using a Brokerage/Trading strategy

Consider individual userpolicies
Higher-levelscheduling
Coordinate Allocations
Submit Grid Job Description
Select Offers
Discover Resources
Collect Offers
Query for Allocation Offers
Generate Allocation Offer
Lower-levelscheduling
Consider individual owner policies
Analyze Query
97
Properties of Multi-Level Scheduling Model
  • Multi-level scheduling must support different RM
    systems and strategies.
  • Provider can enforce individual policies in
    generating resource offers.
  • User receive resource allocation optimized to the
    individual objective
  • Different higher-level scheduling strategies can
    be applied.
  • Multiple levels of scheduling instances are
    possible
  • Support for fault-tolerant and load-balanced
    services.

98
Negotiation in Grids
  • Multilevel Grid scheduling architecture
  • Lower level local scheduling instance
  • Implementation of owner policies
  • Higher level Grid scheduling instance
  • Resource selection and coordination
  • (Static) Interface definition between both
    instances
  • Different types of resources
  • Different local scheduling systems with different
    properties
  • Different owner policies
  • (Dynamic) Communication between both instances
  • Resource discovery
  • Job monitoring

99
Using Service Level Agreements
  • The mapping of jobs to resources can be
    abstracted using the concept of Service Level
    Agreement (SLAs) (Czajkowski, Foster, Kesselman
    Tuecke)
  • SLA Contract negotiated between
  • resource provider, e.g. local scheduler
  • resource consumer, e.g., grid scheduler,
    application
  • SLAs provide a uniform approach for the client to
  • specify resource and QoS requirements, while
  • hiding from the client details about the
    resources,
  • such as queue names and current workload

100
Service Level Agreement Types
  • Resource SLA (RSLA)
  • A promise of resource availability
  • Client must utilize promise in subsequent SLAs
  • Advance Reservation is an RSLA
  • Task SLA (TSLA)
  • A promise to perform a task
  • Complex task requirements
  • May reference an RSLA
  • Binding SLA (BSLA)
  • Binds a resource capability to a TSLA
  • May reference an RSLA (i.e. a reservation)
  • May be created lazily to provision the task
  • Allows complex resource arrangements

101
Agreement-Based Negotiation
  • A client (application) submits a task to a Grid
    scheduler
  • The client negotiates a TSLA for the task with
    the Grid Scheduler
  • In order to provision the TSLA, the Grid
    Scheduler may obtain an RSLA with the Grid
    resource or may use a pre-existing RSLA that the
    Grid scheduler has negotiated speculatively
  • TSLA that refers to an RSLA assures the jobs gets
    the reserved resources at a specified time
  • TSLA without an RSLA tells little about when the
    resources will be available to the job

102
Agreement-Based Negotiation (2)
  • The job starts execution on the resource
    according to the TSLA and the RSLA
  • For an existing TSLA, the Grid Scheduler may
    obtain additional RSLAs
  • An RSLA is negotiated by the Grid Scheduler with
    the Resource
  • A BSLA binds this RSLA to the corresponding TSLA
  • BSLAs allow to dynamically provision resources
    that are either
  • not needed for the whole duration of the task or
  • not known completely (e.g., the time at which a
    resource will be needed) before submitting the
    task

103
Example of Agreement Mapping
  • The Grid Scheduler receives requests for two
    agreements.
  • It negotiates with the resources the RSLA1 and
    RSLA2, and
  • in parallel with the agreement initiators about
    the corresponding TSLA1 and TSLA2.

User/Application
User/Application
TSLA 1
TSLA 2
Grid Scheduler
RSLA 1
RSLA 2
104
GGF GRAAP-WG
  • Goal Defining WebService-based protocols for
    negotiation and agreement management
  • WS-Agreement Protocol

105
Towards Grid Scheduling
  • Grid Scheduling Methods
  • Support for individual scheduling objectives and
    policies
  • Multi-criteria scheduling models
  • Economic scheduling methods to Grids
  • Architectural requirements
  • Generic job description
  • Negotiation interface between higher- and
    lower-level scheduler
  • Economic management services
  • Workflow management
  • Integration of data and network management

106
Grid Scheduling Strategies
  • Current approach
  • Extension of job scheduling for parallel
    computers.
  • Resource discovery and load-distribution to a
    remote resource
  • Usually batch job scheduling model on remote
    machine
  • But actually required for Grid scheduling is
  • Co-allocation and coordination of different
    resource allocations for a Grid job
  • Instantaneous ad-hoc allocation is not always
    suitable
  • This complex task involves
  • Cooperation between different resource providers
  • Interaction with local resource management
    systems
  • Support for reservations and service level
    agreements
  • Orchestration of coordinated resources
    allocations

107
User Objective
  • Local computing typically has
  • A given scheduling objective as minimization of
    response time
  • Use of batch queuing strategies
  • Simple scheduling algorithms FCFS, Backfilling
  • Grid Computing requires
  • Individual scheduling objective
  • better resources
  • faster execution
  • cheaper execution
  • More complex objective functions apply for
    individual Grid jobs!

108
Provider/Owner Objective
  • Local computing typically has
  • Single scheduling objective for the whole system
  • e.g. minimization of average weighted response
    time or high utilization/job throughput
  • In Grid Computing
  • Individual policies must be considered
  • access policy,
  • priority policy,
  • accounting policy, and other
  • More complex objective functions apply for
    individual resource allocations!
  • User and owner policies/objectives may be subject
    to privacy considerations!

109
Grid Economics Different Business Models
  • Cost model
  • Use of a resource
  • Reservation of a resource
  • Individual scheduling objective functions
  • User and owner objective functions
  • Formulation of an objective function
  • Integration of the function in a scheduling
    algorithm
  • Resource selection
  • The scheduling instances act as broker
  • Collection and evaluation of resource offers

110
Scheduling Objectives in the Grid
  • In contrast to local computing, there is no
    general scheduling objective anymore
  • minimizing response time
  • minimizing cost
  • tradeoff between quality, cost, response-time
    etc.
  • Cost and different service quality come into play
  • the user will introduce individual objectives
  • the Grid can be seen as a market where resource
    are concurring alternatives
  • Similarly, the resource provider has individual
    scheduling policies
  • Problem
  • the different policies and objectives must be
    integrated in the scheduling process
  • different objectives require different scheduling
    strategies
  • part of the policies may not be suitable for
    public exposition(e.g. different pricing or
    quality for certain user groups)

111
Grid Scheduling Algorithms
  • Due to the mentioned requirements in Grids its
    not to be expected that a single scheduling
    algorithm or strategy is suitable for all
    problems.
  • Therefore, there is need for an infrastructure
    that
  • allows the integration of different scheduling
    algorithms
  • the individual objectives and policies can be
    included
  • resource control stays at the participating
    service providers
  • Transition into a market-oriented Grid scheduling
    model

112
Economic Scheduling
  • Market-oriented approaches are a suitable way to
    implement the interaction of different scheduling
    layers
  • agents in the Grid market can implement different
    policies and strategies
  • negotiations and agreements link the different
    strategies together
  • participating sites stay autonomous
  • Needs for suitable scheduling algorithms and
    strategies for creating and selecting offers
  • need for creating the Pareto-Optimal scheduling
    solutions
  • Performance relies highly on the available
    information
  • negotiation can be hard task if many potential
    providers are available.

113
Economic Scheduling (2)
  • Several possibilities for market models
  • auctions of resources/services
  • auctions of jobs
  • Offer-request mechanisms support
  • inclusion of different cost models, price
    determination
  • individual objective/utility functions for
    optimization goals
  • Market-oriented algorithms are considered
  • robust
  • flexible in case of errors
  • simple to adapt
  • markets can have unforeseeable dynamics

114
Problem Offer Creation
Job
t
t0
R1 R2 R3 R4 R5 R6 R7 R8
115
Offer Creation (2)
Offer 1
t
R1 R2 R3 R4 R5 R6 R7 R8
116
Offer Creation (3)
Offer 2
t
R1 R2 R3 R4 R5 R6 R7 R8
117
Offer Creation (4)
Offer 3
t
R1 R2 R3 R4 R5 R6 R7 R8
118
Evaluate Offers
  • Evaluation with utility functions
  • A utility function is a mathematical
    representation of a users preference
  • The utility function may be complex and
  • contain several different criteria
  • Example using response time (or delay time) and
    price

119
Optimization Space
Improved Utility
latency
Write a Comment
User Comments (0)
About PowerShow.com