Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li

Description:

Runs on a diversity of hardware/OS platforms: - HP with HPUX - Sun SPARC with Solaris - SGI with IRIX - Intel x86 with Linux, Windows - ALPHA with Unix, Linux ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 85
Provided by: ascDiF
Category:

less

Transcript and Presenter's Notes

Title: Grid Scheduling and Resource Management baseado no Cap 6 do livro The Grid Core Technologies, M' Li


1
Grid Scheduling and Resource Managementbaseado
no Cap 6 do livro The Grid Core Technologies, M.
Li M. Baker, J, Wiiey, 2005
2
  • Grid scheduling
  • Mapping Grid jobs to resources over multiple
    admin. domains
  • A Grid job can be split in many tasks the
    scheduler must
  • select resources
  • schedule tasks
  • to meet user/application requirements
  • (global exec. time and cost of used resources)

3
Scheduling paradigms
  • Centralized scheduling
  • Distributed scheduling
  • Hierarchical scheduling

4
Centralized scheduling
  • A central node is the resource manager to
    schedule jobs to all known nodes
  • Practical use in computing centres where
    resources have similar characteristics and usage
    policies
  • Pending jobs wait in a central job queue until
    dispatched by the central scheduler

5
(No Transcript)
6
  • Advantages
  • Good scheduling decisions?
  • - access to all needed information, up-to-date
    about available resources
  • Disadvantages
  • does not scale well when resource pool increases
  • scheduler is a bottleneck
  • single point of failure

7
Distributed scheduling
  • No central scheduler.
  • Multiple local schedulers cooperate to dispatch
    jobs to the nodes
  • Two approaches
  • with direct communication among schedulers
  • Indirect communication

8
  • Advantages
  • scalability
  • better fault-tolerance and reliability
  • Disadvantages
  • no global scheduler ? may lead to sub-optimal
    scheduling decisions

9
Direct communication
  • Each scheduler has
  • a list of remote schedulers with which it
    communicates for job dispatching
  • Or
  • there is a central information directory with
    information on each scheduler

10
(No Transcript)
11
  • If a job cannot be dispatched via its local job
    queue, other schedulers are contacted to find
    appropriate resources

12
Indirect communication via a job pool
  • Jobs that cannot execute immediately locally, are
    sent to a central job pool.
  • Schedulers can select suitable jobs to run on
    their resources.
  • Policies must ensure all jobs eventually get
    executed.

13
(No Transcript)
14
Hierarchical scheduling
  • A central scheduler interacts with local
    schedulers for job submission as a kind of
    meta-scheduler dispatching submitted jobs to the
    local schedulers
  • Can have scalability, bottleneck, and
    fault-tolerance problems
  • But allows different policies for job scheduling
    from the global and local schedulers

15
(No Transcript)
16
Scheduling operations
  • 4 main stages
  • Resource discovery
  • Resource selection
  • Schedule generation
  • Job execution

17
1- Resource discovery
  • Goal Identify a list of authenticated resources
    available for job submission and execution
  • Needs to consider dynamic changes by deciding
    depending on dynamic state information on the
    available resources, and by online revising the
    decisions.
  • Eg like a compilerthat schedules machine
    instructions to minimize resource idle time
  • Need to know
  • what resource are accessible
  • how busy they are
  • how long to communicate with them
  • how long to communicate between them
  • To decide on more efficient resource allocation.
  • Typical models
  • pull
  • push
  • push-pull

18
The pull model
  • A daemon associated with the scheduler is
    responsible for querying Grid resources to get
    state information on
  • CPU loads,
  • available memory,
  • etc.
  • Has small communication overhead
  • But needs frequent querying, otherwise
    information gets out-of-date and can lead to bad
    decisions

19
(No Transcript)
20
The push model
  • Each resource has a daemon for collecting local
    state information to be sent to a central
    scheduler ? to be saved in a database with
    records on each resource activity
  • Frequent updates can keep more accurate views but
    are intrusive in the database and network traffic

21
(No Transcript)
22
The push-pull model
  • Each resource has a local daemon to collect state
    information to be sent to intermediate nodes
    aggregators that merge state information from
    multiple sub-systems.
  • The scheduler makes queries to the aggregators
    asking about resource informations
  • Issues
  • what is the useful information
  • how often must be collected
  • how long should be kept in the system

23
(No Transcript)
24
2- Resource selection
  • From the available resources, select the
    resources that best fit the user/application
    constraints (CPU, Mem, disk, etc) to run a
    submitted job.
  • Identifies a list of resources satisfying the
    minimal requirements to run the job

25
3- Schedule generation
  • a) select resources
  • identify the best resources to run the job
  • a resource selection algorithm analyses the
    current state of resources and selects the best
    based on a quantitative evaluation
  • -- random selection?
  • -- eg of an algorithm based on
  • EvalResource (EvalCPU EvalRAM) /
    (WCPUWRAM)
  • EvalCPU WCPU (1-CPUload) (CPUspeed /
    CPUmin)
  • EvalRAM WRAM(1-RAMusage) (RAMsize /
    RAMmin)
  • b) select jobs

26
b) select jobs
  • Select a job from a job queue for execution
  • Possible strategies
  • FCFS follows submission order.
  • if no R available for this job ? scheduler
    waits ? all other jobs wait !!
  • -- R can be badly used
  • -- possibly affects high priority jobs
  • Random selects next job randomly from the job
    queue.
  • -- can be unfair
  • Priority-based a job priority can be set on job
    submission-
  • -- difficult to define criteria for job
    priorities
  • Backfilling requires knowledge on expected
    execution time of a job to be scheduled.

27
4- Job execution
  • Prepare job for execution.

28
Case studies
29
Condor
  • A resource management and job scheduling system
    (from Univ. Wisconsin-Madison, US)
  • Runs on a diversity of hardware/OS platforms
  • - HP with HPUX
  • - Sun SPARC with Solaris
  • - SGI with IRIX
  • - Intel x86 with Linux, Windows
  • - ALPHA with Unix, Linux
  • - PowerPC with Mac OS X and AIX
  • -Itanium with Redhat
  • Supports heterogeneous pool of Unix and Windows
    nodes.
  • Job launched from Unix can run on Unix or Windows
    nodes
  • and vice-versa

30
Condor pools
  • Resources organised as Condor pools
  • Pool an administrated domain of hosts (can be
    shared with other execution environments)
  • A system can have multiple pools.
  • Each pool has a flat node organisation.

31
Architecture of a Condor pool
32
Condor pool
  • one central manager (Master Host) manages
    resources and jobs
  • an arbitrary number of execution (worker) hosts
  • - each execution host can be configured as
  • -- a job execution host
  • -- a job submission host
  • -- both
  • On failure of the central manager
  • -- currently executing jobs are not affected
  • -- queued jobs are not affected in the queue
    but cannot start until restarting the manager

33
Daemons in the Condor pool
  • - daemons run in background
  • a) condor_master
  • runs on each host
  • spawns other daemons (condor_startd,
    condor_schedd)
  • periodically checks if any new binaries are
    installed for any of these daemons and restarts
    them if needed
  • if any daemon crashes the master sends an
    email to the admin of the Condor pool and tries
    to restart the daemon
  • also supports management commands
  • to allow admin. start, stop,
    reconfigure daemons remotely

34
b) condor_startd
  • runs on each host
  • advertises information on the node resources for
    the condor_collector daemons (running on the
    Master host) for matching pending resource
    requests
  • enforces policies imposed by resource owners to
    control
  • conditions to start, suspend, resume, or
    kill remote jobs
  • when is ready to execute a Condor job on an
    Execution host, it launches the condor_starter

35
c) condor_starter
  • only runs on Execution hosts
  • it actually spawns a remote Condor job on a
    given host in the pool
  • it sets up the execution environment and
    monitors the job during execution
  • when a job completes
  • the condor_starter sends back status to the
    job submission node and exits

36
d) condor_schedd
  • runs on each host
  • handles resource requests
  • user jobs submitted to a node are stored in a
    local job queue managed by this daemon
  • command-line tools as
  • condor_submit, condor_q, condor_run
  • interact with this daemon to access information
    on the job queue
  • advertises the job requests with resource
    requirements in its local job queue to the
    condor_collector on the Master host
  • once a job request from a condor_schedd on a
    submission host is matched with a resource on a
    Execution host
  • it spawns a condor_shadow on the
    submission host to serve that particular job
    request

37
e) condor_shadow
  • only runs in submission hosts
  • acts as the resource manager for user job
    submission requests
  • does remote system calls for checkpointing jobs
    submitted
  • any system call done on a remote execution host
    is sent back to this daemon on the submission
    host and results also sent to it.
  • also decides on
  • where job checkpoint files should be stored
  • how certain job files should be accessed

38
f) condor_collector
  • only runs on the Central Manager host
  • interacts with condor_startd and _schedd on
    other hosts, to collect status info about a
    Condor pool such as
  • job requests and resources available
  • command-line condor_status can query this daemon
    for status information

39
f) condor_negotiator
  • only runs on the Central Manager host
  • is responsible for matching a resource with a
    specific job request
  • periodically starts a negotiation cycle
    queries the _collector for current state of all
    available resources
  • interacts with each _schedd running on
    each submission host that has resource requests
    in a priority order
  • and tries to match available resources
    with such requests
  • can preempt a low priority running user job
    to enable running a higher priority user job

40
g) condor_kbdd
  • only runs on an Execution host
  • to detect user console activity keyboard or
    mouse
  • and send information to the condor_startd for it
    to know a user machine owner is using the machine
    again
  • allowing policies to decide if the job should be
    stopped.

41
h) condor_ckpt_server
  • runs on a checkpoint server ie an Execution host
  • to store and retrieve chekpointed files
  • if a checkpoint server is down, Condor will send
    the checkpointed files for a given job back to
    the job submission host

42
(No Transcript)
43
Job life cycle in Condor
  • Job submission by a Submission host with a
    condor_submit command
  • Job request advertising on getting job request,
    the _schedd on the Submission host advertises it
    to the _collector on the Central Manager host
  • Resource advertising each _startd running on an
    Execution host advertises resources available to
    the _collector
  • Resource matching the _negotiator running on the
    Central manager periodically queries the
    _collector to match a resource for a user job
    request. Then it informs _schedd on the
    Submission host about the mached execution host
  • Job execution the _schedd informs the _startd on
    the matched Execution host to spawn a _starter
    there, and also launches a _shadow on the
    Submission host to interact with the _started for
    job execution control. The _starter gets a User
    job to execute

44
Job life cycle in Condor (cont.)
  • 6. Return output when a job is completed, the
    results will be sent back to the Submission host
    by the interaction between the _shadow and the
    _starter-

45
(No Transcript)
46
Security management in Condor
  • strong support for authentication, encryption,
    integrity assurance and authorization
  • when installing Condor
  • nothing is ensured in the default
    configuration settings
  • an admin uses configuration macros to enable
    such features
  • a) authorization
  • protects resource usage by granting/denying
    access requests made to the resources
  • defines who is allowed to do what
  • is granted based on specific access levels
  • (eg READ permission to view status of pool, WRITE
    permission to submit a job)

47
  • b) authentication
  • provides an assurance of an identity
  • via macros, both a client and a daemon can
    specyfy of authentication is required
  • Eg if the config file for a daemon has
  • SEC_WRITE_AUTHENTICATION REQUIRED
  • or SEC_DEFAULT_AUTHENTICATION REQUIRED
  • If no authentication methods are specified in the
    configuration Condor uses a default from Globus
    GSI authentication with x.509 certificates,
    Kerberos authentication or file system
    authentication.

48
  • c) encryption
  • provides privacy support between two
    communicating parties.
  • d) Integrity checks
  • assures the messages between communicating
    parties have not been modified by detecting any
    change.

49
Job management in Condor
  • Job a work unit submitted to a Condor pool for
    execution
  • Job types executable sequential or parallel
    codes
  • -- may be a long running job
  • -- a periodically runnable job
  • -- a parallel job in multiple machines
  • Queue
  • a job queue in each Submission host is managed
    by the _sched
  • a job in a queue can be removed and put on hold
  • Job status
  • Idle no activity
  • busy running
  • suspended
  • vacating currently checkpointing
  • killing currently being killed
  • benchmarking via _startd

50
Job run-time environments
  • Condor Universe specifies a Condor execution
    environment
  • Examples
  • a) Default Standard Universe for a job that was
    relinked with condor_compile with Condor libs
    (supports checkp remote sys calls)
  • b) Vanilla Universe for jobs not linked with
    Condor libs to submit shell scripts to Condor
  • c) PVM Universe for a parallel job in PVM
  • d) MPI Universe for MPI in the MPICH
  • e) Java Universe for Java programs
  • f) Globus Universe interface for starting
    Globus jobs from Condor each job queued in the
    job submission file is translated into Globus RSL
    and submitted to Globus via GRAM protocol
  • g) Scheduler Universe execute job on its
    submission host

51
Job submission with a shared FS
  • If jobs are submitted without using the file
    transfer mechanism
  • Condor must use a shared FS to access input
    and output files
  • Then the job must be able to access the data
    files from any machine on which it could
    potentially run

52
Job submission without a shared FS
  • if a job is submitted using the file transfer
    mechanism in Condor
  • then any needed files will be transferred from
    the submission host to a temp working directory
    on the execution host
  • after execution, output files will be
    transferred back to the submission host
  • user specifies in the job submission description
    file
  • which files to transfer
  • at what point the output files should be
    copied back into the submission host

53
Job priorities
  • Allows assign a priority level to each submitted
    job
  • And it can be changed during execution

54
Job flow management
  • Condor uses a DAG to represent a set of tasks in
    a job submission
  • Condor finds the hosts for execution of the
    tasks but does not schedule the tasks in terms of
    dependencies
  • For that purpose there is DAGMan a
    meta-scheduler for Condor jobs that submits jobs
    to Condor at an order represented by a DAG and
    processes the results
  • an input file is used to describe the
    dependencies of the tasks involved in the DAG
    and each task in the DAG also has its
  • own description file

55
Job monitoring
  • Using the condor_q
  • can monitor the status of a job
  • and by inspecting the log files managed by
    DAGMan
  • or by using condor_q-dag

56
Job recovery the Rescue DAG
  • When a node fails, computation of the job DAG
    proceeds until dependencies do not allow it.
  • An uncompleted DAG is then saved in a file so
    that when restarting, completed nodes do not have
    to be repeated.

57
Job checkpointing
  • For long running jobs, gives fault-tolerance
  • It takes a snapshot of current state of a job to
    allow restarting it
  • Allows Condor to reconsider scheduling decisions
    via preemptive-resume scheduling
  • if the scheduler decides to deallocate a
    host to a job
  • (eg when host owner gets back to work)
  • it can checkpoint the job and preempt it
    without loosing work
  • already done
  • the job can be resumed later when the
    scheduler allocates
  • it to a new host

58
Computing on demand
  • Extends Condor for running short-term jobs on
    available resources immediately
  • for interactive computation-intensive jobs

59
Flocking
  • a Condor job submitted in a pool can get
    executed in another pool via configuration the
    _schedd can support job flocking

60
Resource management in Condor
  • Tracking resource usage
  • _startd on each host reports to the
    _collector about Resources available on that host
  • User priority
  • Job Scheduling policies
  • to avoid large jobs from taking the resources,
    a up-down strategy changes the job priorities
    inversely to the number of cycles required
  • and uses
  • fcfs by default
  • preemptive scheduling of low priority jobs
  • dedicated scheduling with no preemption

61
Resource matching in Condor
  • to match an execution host to run a selected job
    or jobs
  • _collector receives job request advertisements
    from _schedd on each submission host and
  • receives resource advertisements
    from _startd on each
  • execution host
  • --- a resource match is done by the _negotiator
    by selecting a resource based on the job
    requirements
  • both advertisements are described in Condor
    Classified Advertisement language (ClassAd)
    representing the
  • characteristics and constraints of hosts and
    jobs

62
ClassAd
  • Is a set of uniquely named expressions, each
    called an attribute
  • MyTypejob
  • TargetTypemachine
  • ((other.Arch Intel
  • other.OS Linux)
  • Other.Disk gt my.DiskUsage)
  • ...
  • Includes a query language

63
Condor support in Globus
  • Jobs can be submitted directly to a Condor pool
    from a Condor host or via Globus
  • by configuring the Globus host with Condor
    jobmanager included in Globus
  • jobs are submitted to Globus via
    globus_job_run
  • but are redirected to Condor via
    condor_submit

64
(No Transcript)
65
Condor-G
  • version of Condor to maintain interaction with a
    Globus gatekeeper submitting and monitoring jobs
    to Globus
  • allows job descriptions similar to Condor to be
    run under Globus grid resources
  • Condor-G is the job management part of Condor

66
(No Transcript)
67
SGE Sun Grid Engine
  • A distributed resource management and scheduling
    system for Unix environments
  • to find and manage a pool of resources and
    schedule jobs
  • Is an open-source project
  • Architecture
  • master host a single host handles all
    requests from users, job scheduling decisions and
    job dispatching to execution hosts
  • submit host machines configured to submit,
    monitor and manage jobs, and the cluster
  • execution host permission to run SGE jobs
  • admin host for configurations of the cluster
  • shadow master host monitors the master and
    assumes control if the master fails. Current jobs
    not affected by the failure.

68
(No Transcript)
69
Daemons
  • sge_qmaster central manager keeps tables about
    hosts, queries, jobs, system load and user
    permissions
  • gets scheduling decisions from _schedd
  • asks actions to _execd on the execution
    hosts
  • runs on the master host
  • sge_schedd keeps an up-to-date view of the
    cluster status
  • makes scheduling decisions on which jobs to
    dispatch to which
  • queues
  • forwards the decisions to the _qmaster
  • runs on the master host
  • sge_execd keeps the queues on its host and job
    execution
  • periodically forwards info on job status and
    host load to the _qmaster
  • runs on each execution host

70
  • sge_commd handles communications among SGE
    components using a well-know TCP port
  • runs on each execution host and on the master
    host
  • sge_shepherd started by the _execd, this daemon
    runs for each job under execution on a host,
    controls the process execution and collects
    accounting data on job completion

71
(No Transcript)
72
Job management in SGE
  • Job types
  • batch, interactive parallel and array ( a job
    can be replicated n times with distinct input
    data sets (for parameter sweep studies))
  • Submitted jobs are put into job queues
  • A SGE queue is a container for a class of jobs
    allowed to execute on a specific host
    concurrently
  • a queue determines certain job attributes eg
    if it can migrate
  • A job is associated with a queue actions on the
    queue affect all its jobs eg suspend all jobs in
    a queue
  • Job submission by a user only gives the
    requirements profile (memory, OS, software) --gt
    SGE dispatches to a suitable queue on a lightly
    loaded host
  • If a job is submitted to a specific queue bound
    to that queue and its host

73
Job run-time environments in SGE
  • Three execution modes are supported
  • batch for sequential programs
  • interactive gives user shell access
    command-line oriented to some suitable host
  • parallel uses PVM or MPI environments

74
Job selection and RM in SGE
  • Jobs submitted to the master are kept in a
    spooling area until _schedd decides that the job
    is ready to run
  • available resources are matched with the job
    requirements
  • eg available memory, CPU speed, available
    software licences
  • (info periodically collected by the execution
    hosts)
  • On sucessful matching, higher priority jobs are
    dispatched first
  • Scheduling criteria (besides other urgent
    resource reservation)
  • a) job priorities
  • a fifo rule is used by default, with all
    pending jobs in an ordered list by submission
    order
  • if a suitable queue is available for
    the head, it is dispatched
  • independent of that, it tries to
    dispatch the 2nd job, etc
  • a priority defined by the admin can modify the
    fifo order, the pending job list is ordered on
    priorities
  • b) equal share

75
Equal-share scheduling
  • If a series of jobs is submitted at almost the
    same time, they would be put in the same group of
    queues and would wait long to execute
  • equal-sharing tries to avoid this
  • by sorting the jobs of a user with a currently
    executing job,
  • puts the new jobs of the same priority in the
    end of the list
  • -------
  • Jobs can be directly submitted to
  • a SGE cluster
  • or via Globus

76
(No Transcript)
77
Conclusions
  • Condor and SGE are single administrative domain
    RMS and Scheduling
  • but can be interconnected across administrative
    boundaries using Globus
  • Common aspects
  • master-worker based
  • one master host central manager per system
  • arbitrary number of worker machines used for
    job submission, job execution or both
  • centralized scheduling
  • priority-based job scheduling
  • support batch jobs
  • a diversity of platforms
  • support authentication and authorization

78
  • Availability free downloaded
  • Windows support Condor partial support, SGE
    only Unix
  • GUI support Condor is command-line oriented but
    has some graphic tools (Condorview graphical
    history of resources in the pool
    CondorUserLogViewer graphical history of a set of
    jobs submitted) SGE has GUI
  • Jobs supported all support batch and parallel
    jobs using MPI and PVM. Condor does not support
    interactive jobs. SGE does.
  • Resource reservation both job checkpointing and
    fault recovery
  • Job flocking Condor allows a job to migrate to
    another cluster
  • Job scheduling both preemptive scheduling SGE
    supports deadline constraint scheduling
  • Resource matching both
  • Job flow management both support inter-job
    dependency descriptions for complex applications

79
Grid scheduling with QoS
  • Condor and SGE lack support for QoS in
    scheduling.
  • Aspects to be considered
  • job characteristics
  • market.based scheduling models
  • planning in scheduling
  • rescheduling
  • scheduling optimization
  • performance prediction
  • Eg, AppLeS adaptive application-leve scheduling
    system
  • measures the performance of the application on
    a specific site resource and uses this to make
    resource selection and scheduling decisions. For
    master-slave applications.

80
AppLeS
81
AppLeS
  • Components
  • Network Weather Service dynamic gathers info of
    system state and forecasts of resource loads
  • User specifications info about user criteria
    for performance, execution constraints, and other
  • Model a repository of default models,
    originated by similar classes of applications
    that can be used for performance estimation,
    planning and resource selection.
  • Resource selector choose and filter different
    resource combination
  • Planner generate a description of a
    resource-dependent schedule from a given resource
    combination
  • Performance estimator generate an estimate for
    candidate schedules according to users
    performance metric
  • Coordinator chooses the best schedule
  • Actuator implements the decided schedule on the
    target system

82
Steps in using AppLeS
  • 1. User specifies a Heterogenous Application
    Template with info on the structure,
    characteristics and constraints
  • 2. Coordinator uses this to filter out impossible
    schedules
  • 3. Resource selector identifies possible sets of
    resources and prioritizes them based on a logical
    distance between resources
  • 4. Planner defines a potential schedule for each
    viable resource configuration
  • 5. Performance estimator evaluates such schedule
    in terms of user performance
  • 6. Coordinator chooses the best schedule and pass
    it to the actuator.

83
GrADS
84
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com