Scheduling Algorithms or the Process of Choosing the Best Job to Run - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Scheduling Algorithms or the Process of Choosing the Best Job to Run

Description:

Condor implements priority scheduling. Priority preemption can also be used ... openpbs.org/ http://www.cs.wisc.edu/condor. http://www.supercluster.org/maui ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 37
Provided by: edwardchr
Category:

less

Transcript and Presenter's Notes

Title: Scheduling Algorithms or the Process of Choosing the Best Job to Run


1
Scheduling Algorithms(or the Process of Choosing
the Best Job to Run)
2
Scheduling Algorithms
  • Introduction
  • By default
  • Workload managers, their schedulers, and
    algorithms
  • Future directions

3
Introduction
  • Efficient scheduling across nodes is necessary to
    maximize application performance regardless of
    the efficiency of your parallel algorithms.
  • Dynamic scheduling in a heterogeneous environment
    is significantly more complicated.
  • Programming parallel applications is difficult,
    and not worth the effort unless large performance
    gains can be realized.

4
Scheduling
  • Scheduling is a key part of the workload
    management software which usually perform some
    or all of
  • Queuing
  • Scheduling
  • Monitoring
  • Resource management
  • Accounting

5
Cont
  • The difficult part of scheduling is to balance
    policy enforcement with resource optimization in
    order to pick the best job to run.
  • Essentially one can think of the scheduler
    performing the following loop
  • Select the best job to run, according to policy
    and available resources.
  • Start the job.
  • Stop the job and/or clean up after completion.
  • repeat.

6
Cont
  • Scheduling can be described as either
  • Application level
  • APST, AMWAT, MARS, (AppLeS)
  • Prophet
  • Resource level
  • UNIX, Condor, Maui, PBS

7
By DefaultUnix Scheduling Algorithms
  • UNIX is fundamentally a time-share (TS) operating
    system 1978
  • TS scheduler provides illusion to each user that
    they are the only one accessing system resources
  • Employs a round-robin scheduling algorithm

8
Round-robin
  • Each process is placed in a run-queue
  • Allocated a service quantum of time (commonly set
    to 10 milliseconds)
  • Processes that demand less time run without being
    interrupted
  • Processes exceeding the service quantum are
    interrupted and returned to the back of the
    run-queue to await further processing

9
Round-robin Cont
  • Round-robin favours short process demands
  • Consequently it is biased in favour of a greedy
    user who runs many short-demand processes
  • The UNIX scheduler schedules processes not
    users
  • What was needed was a fair-share (FS) scheduler

10
Fair-share
  • Implemented on UNIX systems circa 1988
  • In context of computer resources, fair is meant
    to imply equity not equality in resource
    consumption

11
Fair-share Cont
  • The FS scheduler has two levels of scheduling
    process and user
  • Process level scheduling same as in standard UNIX
    (priority and nice values act as bias to
    scheduler as it repositions processes in the
    run-queue)
  • User level scheduler relationship can be seen in
    the following (simplified) pseudo-code

12
Fair-share AlgorithmUser Level Scheduling
  • /Every 400 milliseconds, accumulate the CPU
    usage per user/
  • Parameters
  • delta 4 Sec
  • decayUsage
  • For (I0 IltUSERS I)
  • usageI decayUsage
  • usageI costI
  • costI 0

13
Fair-share AlgorithmProcess Level Scheduling
  • Parameters
  • priorityDecay Real number in the range 0..1
  • Every 100 milliseconds, decay ALL process
    priority values
  • for (k0 kltPROCS k)
  • sharepriorityk priorityDecay
  • priorityDecay a p_nicek b
  • /Every 10 milliseconds (or tick) alter the
    active process priorities per user/
  • for (i0 iltUSERS i)
  • sharepriorityi usagei p_activei

14
Fair-share Cont
  • Whereas process-level scheduling still occurs 100
    times a second, user-level scheduling adjustments
    (usage parameter) occur once every 4 seconds
  • Also, once a second, process-level priority
    adjustments that were made in the previous second
    begin to be forgotten. This is to avoid
    starving a process

15
Commercial Products
16
Cluster Managers, Their Schedulers and Algorithms
  • Condor
  • Full featured batch system
  • Used to manage clusters of beowulf nodes
  • Portable Batch Scheduler (PBS)
  • Based on Network Queuing System (NQS) 1986 NASA
    first on Cray and then ported to other
    architectures
  • IEEE POSIX standard (1003.2d)

17
Cluster Managers, Their Schedulers and Algorithms
  • Maui
  • Developed to manage machines at the Maui HPCC
  • External scheduler uses native schedulers such
    as PBS and LoadLeveler
  • Included in cluster building toolkits such as
    Rocks and OSCAR
  • GAs
  • Genetic algorithms

18
Condor
  • DAGman (Directed Acyclic Graph Manager)
  • Meta-scheduler for condor jobs
  • Submits jobs in an order represented by a DAG and
    processes the results

19
Condor Cont
  • An example input file
  • Job A A.condor
  • Job B B.condor
  • Job C C.condor
  • Job D D.condor
  • Parent A child B C
  • Parent B C child D
  • Nodes B and C are run in parallel

A
B
C
D
20
Condor Cont
  • Jobs queues are not strictly FIFO
  • Condor implements priority scheduling
  • Priority preemption can also be used
  • Priority calculations are based on ratios instead
    of absolutes
  • Starvation is prevented by a FS algorithm which
    attempts to give all users the same amount of
    machine allocation time over a specified interval
    (configurable)

21
PBS
  • Includes several built-in schedulers
  • Default is FIFO but not strict (based on queue
    priority)
  • Behaviour is to maximize CPU utilization
  • Starving-jobs mechanism helps

22
PBS Scheduling Policies
  • PBS has seven scheduling algorithms (
    represents default)
  • Decaying FS algorithm
  • Round-robin for jobs and queues
  • Queue priority
  • Job type or job ordering
  • Strict FIFO
  • Load balancing jobs between hosts
  • Dedicated times/nodes

23
Maui
  • Scheduling behaviour is constrained by way of
    throttling policies
  • Both soft and hard limits used
  • Applied to each iteration
  • Three main algorithms used
  • Backfill
  • Priority
  • FS

24
Maui Backfill Algorithm
  • Backfill is scheduling optimization
  • Based on earliest-job-start information
  • Two passes
  • Pass 1 jobs that meet soft policies
  • Pass 2 expands list from pass 1 to include hard
    fairness policies
  • Maui applies the backfill algorithm specified by
    the BACKFILLPOLICY parameter, be it firstfit,
    bestfit, or balfit

25
Backfill Algorithm
  • Assuming bestfit the following steps apply
  • 1. feasible backfill jobs are filtered selecting
    those that actually fit the current backfill
    window
  • 2. base degree-of-fit on scheduling criteria
    (I.e. processors, seconds etc.)
  • 3. job with best fit is started and the backfill
    window size adjusted
  • 4. while backfill jobs and idle resources remain
    repeat step 1

26
Maui Priority Algorithm
  • Default is trivial FIFO but is weighted and
    combined based on service, requested resources,
    fair-share, direct priority, target, and bypass
  • Priority serviceweight servicefactor
  • resourceweight resourcefactor
  • fairshareweight fairsharefactor
  • directspecweight directspecfactor
  • targetweight targetfactor
  • bypassweight bypassfactor
  • Each weight value is a configurable parameter
    and each factor is calculated from
    subcomponents listed above (I.e. user, group,
    priority, QoS etc.)

27
Maui Fare-share Algorithm
  • Composed of several parts which handle historical
    information, fairshare windows, usage, and impact
  • All are site configurable parameters
  • Purpose of a fairshare algorithm is to steer
    existing workload

28
Genetic Algorithms (GA)
  • A typical GA structure
  • Genetic Algorithm ()
  • Initialize population/an initial population of
    strings randomly generated/
  • Evaluate population/check fitness of string
    against fitness function/
  • While termination criterion not reached
  • /genetic operators/
  • Select solution for next population
  • Perform crossover and mutation
  • Evaluate population

29
GAs Cont
  • GAs mimic the evolutionary process.
  • Introduced by J.H Holland 1975.
  • In nature, organisms are always competing for
    scant resources such as food and space.

30
GAs Cont
  • A GA can be analyzed by breaking it up into the
    following sequence
  • Encoding mechanism convert variables into a
    form that can be processed by the GA.
  • Fitness function - each generated string has a
    fitness value (or weight) assigned which
    determines the survival rate of the string.

31
GAs Cont
  • Operators.
  • Reproduction process by which individual
    strings are copied according to their fitness
    values. Higher fitness values are more likely to
    be selected.
  • Crossover all strings in the population are
    randomly paired up and may or may not be
    subjected to a crossover depending on a randomly
    generated outcome.
  • Mutation an occasional alteration to the
    strings in the population and is used to
    safeguard against premature convergence.

32
Future Directions
  • High Performance Grid Schedulers
  • Scheduling holds the key to performance in grid
    environments however, high-performance
    application scheduling on computation grids
    represent a world in which much progress needs to
    be made. Part of the problem lies in the
    inherent difficulty of the scheduling problem,
    whose optimal solution is considered infeasible
    even in the simplest environments.

33
Current High Performance Grid Scheduler Projects
  • AppLeS Application Level Scheduler
  • Best of candidate schedules based on users
    performance criteria
  • MARS Meta-computer Adaptive Runtime Scheduler
  • Determines candidate schedule that minimizes
    execution time
  • Prophet -
  • Determines schedule with the minimal predicted
    execution time
  • VDCE Virtual Distributed Computing Environment
  • List scheduling used to match resources with
    application tasks
  • SEA Scheduling Expert Advisor
  • Ready tasks enabled in program graph are next
    to be scheduled

34
Projects continued
  • I-SOFT
  • Centralized scheduler maintains user queues and
    static capacities (FIFO)
  • IOS
  • Offline genetic algorithm (GA) mappings indexed
    by dynamic parameters used to determine mapping
    for current iteration
  • SPP(X)
  • Determination of performance model for candidate
    schedules with minimal execution time
  • DOME
  • Globally controlled or locally controlled load
    balancing

35
Conclusion
  • Scheduling in a heterogeneous environment is
    hard.
  • Keywords to remember
  • Adaptive
  • Dynamic

36
Links
  • http//www.openpbs.org/
  • http//www.cs.wisc.edu/condor
  • http//www.supercluster.org/maui
Write a Comment
User Comments (0)
About PowerShow.com