Grid Scheduling - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Grid Scheduling

Description:

... Portable Batch Scheduler and the Maui Scheduler on Linux ... System algorithms e.g. a site scheduler or the broker. Validation ? Scheduling algorithms ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 17
Provided by: Germ91
Category:

less

Transcript and Presenter's Notes

Title: Grid Scheduling


1
Grid Scheduling
  • Cécile Germain-Renaud

2
Scheduling
  • Job
  • A computation to run on a machine
  • Possibly with network access e.g. input/output
    file (coarse grain) or communication with other
    jobs (the DAG model)
  • Schedule
  • s(J) date to begin execution of task J
  • Alloc(J) machine assigned to J
  • One of the oldest Computer Science problems
  • Principles of classification
  • Graham et al. Optimization and approximation in
    deterministic sequencing and scheduling A
    survey. Ann. Discrete Math. 5, (1979), 287-326
  • Computer-aided classification of complexity
    results (4536 at the time of the paper) Lageweg
    et al. Computer-Aided complexity classification
    of combinational problems. CACM 112, 1892

3
Classical scheduling in HPC
T
  • Context parallel computing/computers
  • Application Direct Acyclic Graph (T, E, w, c)
  • T set of sequential tasks
  • E dependence constraints
  • w(t) computational cost of task t
  • c(t,t) communication cost (data sent from t to
    t)
  • Infrastructure
  • P identical processors
  • With or without preemption, dedicated (no
    sharing)
  • An optimization problem with objective function
  • Makespan Total execution time S(T) max (s(t)
    w(t))
  • Complexity
  • NP-complete for independant tasks and no
    communication E vide, p 2 and c 0
  • NP-complete for UET-UCT graphs (w c 1)
  • Very old without communication, list scheduling
    provides a (2-1/p) approximation

T
4
Scheduling in Institutional Grids
  • Institutional federation of ressources
  • accounted-for fair-share on the medium to long
    time scale is a premium constraint
  • Partially autonomous local policies must be
    allowed
  • Grid
  • Permanent regime on-line decisions
  • Large scale strongly distributed
  • Information system
  • Scheduling services
  • Relevant contexts
  • Autonomous, multi-agents systems
  • Auction algorithms
  • Service Level Agreement (SLA) technology

5
EGEE gLite Scheduling
Site (node)
Proc
Broker
CE
Broker
Local scheduler
Broker
6
EGEE gLite Scheduling
Site (node)
Proc
BDII
Broker
Publish
CE
Broker
Local scheduler
Broker
7
EGEE gLite Scheduling
Site (node)
Proc
BDII
Publish
CE
Query
Rank
Broker
Local scheduler
The information published is Static eg which
type of VO is accepted Dynamic expected
traversal time
8
EGEE gLite Scheduling
Site (node)
Proc
BDII
Publish
CE
Query
Rank
Broker
Local scheduler
Rank may be any user-defined function, e.g.
avoid  bad  machines Default is first locality,
second expected traversal time
9
EGEE gLite Scheduling
Site (node)
Proc
BDII
Publish
CE
Update
Broker
Local scheduler
BDII broker cache
10
Not only academic
Overhead Ratio
  • Long waiting times
  • When EGEE was not so heavily loaded

Execution time (s)
11
Batch scheduling
  • Very complex policies
  • Maximise throughput under constraints
  • Weighted fair-share VOs, type of jobs
  • Priorities
  • Hardware requirements
  • Advance reservations
  • An indication of job duration is given by the
    type of queue infinite, long, medium, short, and
    exotic ones
  • B. Bode et al.The Portable Batch Scheduler and
    the Maui Scheduler on Linux Clusters

12
Classical vs Grid
  • (Relatively) easy
  • Throughput instead of makespan Master-slave
    graph instead of DAG allow for instance to define
    cyclic schedules in polynomial time which are
    asymptotically optimal, but not local
  • Y. Robert A. Rosenberg
  • Moderately difficult information about
  • Applications
  • Infrastructures
  • The same program on different data may run at
    very different speed
  • The network performance is dynamic
  • Really difficult
  • Queues managed by local policies
  • On-line decision

13
Information and Scheduling (I)
  • Considerable work has been done in predicting CPU
    load in shared environments desktops, clusters,
    desktop grids P.A. Dinda, R. Wolski, J. Schopf
  • The basic technique is linear time-series
    analysis
  • Self-similarity and epochal behavior
  • Usual goal is the prediction of the next value
  • Applied to soft real-time scheduling on shared
    clusters
  • Practical application in NWS

14
Information and scheduling (III)
  • Less work on predicting the behavior of dedicated
    systems
  • Papers are on parallel systems, mostly based on
    time-series techniques, but at least one based on
    a genetic algorithm Downey, Foster, Wolski
  • The traces are much more difficult to access
  • No time slice - Irregular time series the
    records are event-driven
  • Which analysis
  • Average waiting time clear but not very useful
    for prediction
  • Fitting a distribution not convincing for //
    systems
  • Predicting an upper bound with a confidence
    interval metric of success?

15
Information and grid
  • We cannot directly log the entire state of the
    system
  • Access rights
  • Size
  • Currently available data
  • The lifecycle of jobs going through certain
    brokers
  • The job ranking at the same brokers
  • The detailed behavior of the queues on certain
    sites
  • Certain LAL possibly other mainstream
  • Easy to get
  • Summary data about the lifecycle of all jobs
  • From which it could be possible to reconstruct
    the detailed state and dynamic of the CE

16
What should we learn ?
  • Learning besides time series make sense in a
    grid massive use of community programs instead
    of (?) sparse runs of a very long and complex
    digital experiment
  • Information as sketched before
  • Beware not be a steady-state system
  • New users, new machines, new software is the
    expected regime for some years from now
  • A community-based resource will tend display
    correlated activity
  • Is there an invariant social graph? Is it a
    feature?
  • System algorithms e.g. a site scheduler or the
    broker
  • Validation ?
  • Scheduling algorithms
  • Validation ?
Write a Comment
User Comments (0)
About PowerShow.com