Workflow Scheduling Optimisation: The case for revisiting DAG scheduling - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Workflow Scheduling Optimisation: The case for revisiting DAG scheduling

Description:

A workflow is a Directed Acyclic Graph (DAG) ... Static DAG scheduling onto heterogeneous ... A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. ... – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 71
Provided by: scie205
Category:

less

Transcript and Presenter's Notes

Title: Workflow Scheduling Optimisation: The case for revisiting DAG scheduling


1
Workflow Scheduling OptimisationThe case for
revisiting DAG scheduling
  • Rizos Sakellariou and Henan Zhao
  • University of Manchester

2
Scheduling
Ewa Deelman, deelman_at_isi.edu www.isi.edu/deelma
n pegasus.isi.edu
Slide Courtesy Ewa Deelman, deelman_at_isi.edu www
.isi.edu/deelman pegasus.isi.edu
3
Execution Environment
Slide Courtesy Ewa Deelman, deelman_at_isi.edu www
.isi.edu/deelman pegasus.isi.edu
4
In this talk, optimisation relates to
performanceWhat affects performance?
  • Aim to minimise the execution time of the
    workflow
  • How?
  • Exploit task parallelism
  • But, even if there is enough parallelism, can the
    environment guarantee that this parallelism can
    be exploited to improve performance?
  • No!
  • Why?
  • There is interference from the batch job
    schedulers that are traditionally used to submit
    jobs to HPC resources!

5
Example
  • The uncertainty of batch schedulers means that
    any workflow enactment engine must wait for
    components to complete before beginning to
    schedule dependent components.
  • Furthermore, it is not clear if parallelism will
    be fully exploited e.g., if the three tasks
    above that can be executed in parallel are
    submitted to 3 different queues of different
    length, there is no guarantee that they will
    execute in parallel job queues rule!

This execution model fails to hide the latencies
resulting from the length of job queues these
determine the execution time of the workflows.
6
Then try to get rid of the evil job queues!
  • Advance reservation of resources has been
    proposed to make jobs run at a precise time.
  • However, resources would be wasted if they are
    reserved for the whole execution of the workflow.
  • Can we automatically make advance reservations
    for individual tasks?

7
Assuming that there is no job queue
  • what affects performance?
  • The structure of the workflow
  • number of parallel tasks
  • how long these tasks take to execute
  • The number of the resources
  • typically, much smaller than the parallelism
    available.
  • In addition
  • there are communication costs
  • there is heterogeneity
  • estimating computationcommunication is not
    trivial.

8
What does all this imply for mapping?
  • An order by which tasks will be executed needs to
    be established (eg., red, yellow, or blue first?)
  • Resources need to be chosen for each task (some
    resources are fast, some are not so fast!)
  • The cost of moving data between resources should
    not outweigh the benefits of parallelism.

9
Does the order matter?
  • If task 6 on the right takes comparatively longer
    to run, wed like to execute task 2 just after
    task 0 finishes and before tasks 1, 3, 4, 5.

Follow the critical path! Is this new? Not
really ?
10
Modelling the problem
  • A workflow is a Directed Acyclic Graph (DAG)
  • Scheduling DAGs onto resources is well studied in
    the context of homogeneous systems less so, in
    the context of heterogeneous systems (mostly
    without taking into account any uncertainty).
  • Needless to say that this is an NP-complete
    problem.
  • Are workflows really a general type of DAGs or a
    subclass? We dont really know (some are clearly
    not DAGs only DAGs considered here)

11
Our approach
  • Revisit the DAG scheduling problem for
    heterogeneous systems
  • Start with simple static scenarios
  • Even this problem is not well understood, despite
    the fact that there have been perhaps more than
    30 heuristics published (check the Heterogeneous
    Computing Workshop proceedings for a start)
  • Try to build on as we obtain a good understanding
    of each step!

12
Outline
  • Static DAG scheduling onto heterogeneous systems
    (ie, we know computation communication a
    priori)
  • Introduce uncertainty in computation times.
  • Handle multiple DAGs at the same time.
  • Use the knowledge accumulated above to reserve
    slots for tasks onto resources.

13
Based on
  • 1 Rizos Sakellariou, Henan Zhao. A Hybrid
    Heuristic for DAG Scheduling on Heterogeneous
    Systems. Proceedings of the 13th IEEE
    Heterogeneous Computing Workshop (HCW04) (in
    conjunction with IPDPS 2004), Santa Fe, April
    2006, IEEE Computer Society Press, 2004.
  •  
  • 2 Rizos Sakellariou, Henan Zhao. A low-cost
    rescheduling policy for efficient mapping of
    workflows on grid systems. Scientific
    Programming, 12(4), December 2004, pp. 253-262.
  •  
  • 3 Henan Zhao, Rizos Sakellariou. Scheduling
    Multiple DAGs onto Heterogeneous Systems.
    Proceedings of the 15th Heterogeneous Computing
    Workshop (HCW'06) (in conjunction with IPDPS
    2006), Rhodes, Apr. 2006, IEEE Computer Society
    Press.
  •  
  • 4 Henan Zhao, Rizos Sakellariou. Advance
    Reservation Policies for Workflows. Proceedings
    of the 12th Workshop on Job Scheduling Strategies
    for Parallel Processing, 2006.

14
How to schedule? Our modelA DAG, 10 tasks, 3
machines(assume we know execution times,
communication costs)
15
A simple idea
  • Assign nodes to the fastest machine!

Communication between nodes 4 and 8 takes way
too long!!!
Heuristics that take into account the whole
structure of the DAG are needed
Makespan is gt 1000!
16
H.Zhao,R.Sakellariou. An experimental study of
the rank function of HEFT. Proceedings of
EuroPar03.
17
Hmm
  • This was a rather well defined problem
  • This was just a small change in the algorithm
  • What about different heuristics?
  • What about more generic problems?

18
DAG scheduling A Hybrid Heuristic
  • Trying to find out why there were such
    differences in the outcome of HEFTwe observed
    problems with the order to address those
    problems we came up with a Hybrid Heuristic it
    worked quite well!
  • Phases
  • Rank (list scheduling)
  • Create groups of independent tasks
  • Schedule independent tasks
  • Can be carried out using any scheduling algorithm
    for independent tasks, e.g. MinMin, MaxMin,
  • A novel heuristic (Balanced Minimum Completion
    Time)

R.Sakellariou, H.Zhao. A Hybrid Heuristic for DAG
Scheduling on Heterogeneous Systems. Proceedings
of the IEEE Heterogeneous Computing Workshop
(HCW 04) , 2004.
19
An Example
14
18 22 13 25 15
14 21
17
26 20 26
20 19
 
 
20
  • Mean
    Upward Ranking Scheme
  • The order is
    0, 1, 4, 5, 7, 2, 3, 6, 8, 9

 
 
21
An Example
  • Phase 1 Rank the nodes
  • Phase 2 Create groups of independent tasks

  • The order is 0, 1, 4, 5, 7, 2, 3, 6, 8, 9

 
 
22
Balanced Minimum Completion TimeAlgorithm (BMCT)
  • Step I
  • Assign each task to the machine that gives the
    fastest execution time.
  • Step II
  • Find the machine M with the maximal finish time.
    Move a task from M to another machine, if it
    minimizes the overall makespan.

23
  • Initially assign each task in the group to the
    machine giving the fastest time
  • No movement for the entry task

24
An Example (2)
  • Phase 3 Schedule Independent Tasks in Group 1
  • M0
    M1 M2
  • 0
  • 20
  • 40
  • 60
  • 80
  • 100
  • 120
  • 140
  • Initially assign each task in the group to the
    machine giving the fastest time

25
  • Initially assign each task in the group to the
    machine giving the fastest time
  • M2 is the machine with the Maximal Finish Time
    (70)

26
  • Task 5 moves to M0 since it can achieve an
    earlier overall finish time
  • Now M0 is the machine with the Maximal Finish
    Time (69)

27
  • Task 1 moves to M2 since it can achieve an
    earlier overall finish time
  • Now M2 is the machine with the Maximal Finish
    Time (59)
  • No task can be moved from M2, the movement stops.
  • Schedule next group

28
  • Initially assign each task in this group to the
    machine giving the fastest time

29
  • Task 2 moves to M1 since it can achieve an
    earlier overall finish time
  • M2 is the machine with the Maximal Finish Time
  • No movement from M2
  • Schedule next group

30
  • Initially assign each task in this group to the
    machine giving the fastest time

31
  • Task 6 moves to M0 since it can achieve an
    earlier overall finish time
  • M2 is the machine with the Maximal Finish Time

32
  • Task 8 moves to M1 since it can achieve an
    earlier overall finish time
  • M1 is the machine with the Maximal Finish Time
  • No movement from M1
  • Schedule next group

33
  • Initially assign each task in this group to the
    machine giving the fastest time
  • No movement for the exit task

34
(No Transcript)
35
Experiments
  • DAG Scheduling
  • Algorithms
  • Hybrid.BMCT (i.e. The algorithm as presented),
    and
  • Hybrid.MinMin (i.e. MinMin instead of BMCT)
  • Applications
  • Random-generated graphs
  • Laplace
  • FFT
  • Fork-join graphs
  • Heterogeneity setting (following an approach by
    Siegel et al)
  • Consistent
  • Partially-consistent
  • Inconsistent

36
Hybrid Heuristic Comparison
  • NSL
  • Random DAGs, 25-100 tasks with inconsistent
    heterogeneity
  • Average improvement 25

37
Hmm
  • Yes, but, so far, you have used static task
    execution times in practice such times are
    difficult to specify exactly
  • There is an answer for run-time deviations
    adjust at run-time
  • But
  • dont we need to understand the static case
    first?

38
Characterise the Schedule
  • Spare time indicates the maximum time that a
    node, i, may delay without affecting the start
    time of an immediate successor, j.
  • A node i with an immediate successor j on the
    DAG spare(i,j) Start_Time(j)
    Data_Arrival_Time(i,j)
  • A node i with an immediate successor j on the
    same machine spare(i,j) Start_Time(j)
    Finish_Time(i)
  • The minimum of the above MinSpare for a task.

R.Sakellariou, H.Zhao. A low-cost rescheduling
policy for efficient mapping of workflows on
grid systems. Scientific Programming, 12(4),
December 2004, pp. 253-262.
39
Example
  • DAT(4,7)40.5, ST(7)45.5, hence, spare(4,7) 5
  • FT(3)28, ST(5)29.5, hence, spare(3,5) 1.5
  • DAT Data_Arrival_Time, ST Start_Time, FT
    Finish_Time

40
Characterise the schedule (cont.)
  • Slack indicates the maximum time that a node, i,
    may delay without affecting the overall makespan.
  • Slack(i)min(slack(j)spare(i,j)), for all
    successor nodes j (both on the DAG and the
    machine)
  • The idea keep track of the values of the slack
    and/or the spare time and reschedule only when
    the delay exceeds slack

R.Sakellariou, H.Zhao. A low-cost rescheduling
policy for efficient mapping of workflows on
grid systems. Scientific Programming, 12(4),
December 2004, pp. 253-262.
41
Lessons Learned(simulation and deviations of up
to 100)
  • Heuristics that perform better statically,
    perform better under uncertainties.
  • By using the metrics on spare time, one can
    provide guarantees for the maximum deviation from
    the static estimate. Then, we can minimise the
    number of times we reschedule still achieving
    good results.
  • Could lead to orders of magnitude improvement
    with respect to workflow execution using DAGMAN
    (would depend on the workflow, only partly true
    with Montage)

42
Challenges still unanswered
  • What are the representative DAGs (workflows) in
    the context of Grid computing?
  • Extensive evaluation / analysis (theoretical too)
    is needed. Not clear what is the best makespan we
    can get (because it is not easy to find the
    critical path)
  • What are the uncertainties involved? How good are
    the estimates that we can obtain for the
    execution time / communication cost? Performance
    prediction is hard
  • How heterogeneous our Grid resources really are?

43
Moving on to multiple DAGs
  • It is really ideal to assume that we have
    exclusive usage of resources
  • In practice, we may have multiple DAGs competing
    for resources at the same time

Henan Zhao, Rizos Sakellariou. Scheduling
Multiple DAGs onto Heterogeneous Systems.
Proceedings of the 15th Heterogeneous Computing
Workshop (HCW'06) (in conjunction with IPDPS
2006), Rhodes, Apr. 2006, IEEE Computer Society
Press.
44
Scheduling Multiple DAGsApproaches
  • Approach 1 Schedule one DAG after the other with
    existing DAG scheduling algorithms
  • Low resource utilization long overall makespan
  • Approach 2 Still one after the other, but do
    some backfilling and fill the gaps
  • Which DAG to schedule first? The one with
    longest makespan or the one with shortest
    makespan?
  • Approach 3 Merge all DAGs into a single,
    composite DAG. Much better than Approach 1 or 2.

45
Example Two DAGs to be scheduled together
  • DAG A DAG B

46
Composition Techniques
  • C1 Common Entry and Common Exit Node

47
Composition Techniques
  • C2 Level-Based Ordering

48
Composition Techniques
  • C3 Alternate between DAGs (round robin between
    DAGs)
  • Easy!

49
Composition Techniques
  • C4 Ranking-Based Composition (compute a weight
    for each node and merge accordingly)

A0 B0
50
  • But, is makespan optimisation a good objective
    when scheduling multiple DAGs?

51
Mission Fairness
  • In multiple DAGs
  • Users perspective I want my DAG to complete
    execution as soon as possible.
  • System perspective I would like to keep as many
    users as possible happy I would like to increase
    resource utilisation.
  • Lets be fair to users!
  • (The system may want to take into account
    different levels of quality of service agreed
    with each user)

52
Slowdown
  • Slowdown what is the delay that a DAG would
    experience as a result of sharing the resources
    with other DAGs (as opposed to having the
    resources on its own).
  • Average slowdown for all DAGs

53
Unfairness
  • Unfairness indicates, for all DAGs, how different
    the slowdown of each DAG is from the average
    slowdown (over all DAGs).
  • The higher the difference, the higher the
    unfairness!

54
Scheduling for Fairness
  • Key idea at each step (that is, every time a
    task is to be scheduled), select the most
    affected DAG (that is the DAG with the highest
    slowdown value) to schedule.
  • What is the most affected DAG at any given point
    in time?

55
Fairness Scheduling Policies
  • F1 Based on latest Finish Time
  • calculates the slowdown value only at the time
    the last task that was scheduled for this DAG
    finishes.
  • F2 Based on Current Time
  • re-calculates the slowdown value for every DAG
    after any task finishes. A proportion of time,
    for tasks running, is taken when the calculation
    is carried out.

56
Lessons Learned Open questions
  • It is possible to achieve reasonably good
    fairness without affecting makespan.
  • An algorithm with good behaviour in the static
    case appears to make things easier in terms of
    achieving fairness
  • What is fairness?
  • What is the behavior when run-time changes occur?
  • What about different notions of Quality of
    Service (SLAs, etc)

57
Finally
  • How to automate advance reservations at the task
    level for a workflow, when the user has specified
    a deadline constraint only for the whole workflow?

Henan Zhao, Rizos Sakellariou. Advance
Reservation Policies for Workflows. Proceedings
of the 12th Workshop on Job Scheduling
Strategies for Parallel Processing, 2006.
58
The schedule on the left can be used to plan
reservations. However, if one task fails to
finish within its slot, the remaining tasks have
to be re-negotiated.
59
What we are looking for is
60
The Idea
  • 1. Obtain an initial assignment using any DAG
    scheduling algorithm (HEFT, HBMCT, ).
  • 2. Repeat
  • I. Compute the Application Spare Time ( user
    specified deadline DAG finish time).
  • II. Distribute the Application Spare Time among
    the tasks.
  • 3. Until the Application Spare Time is below a
    threshold.

61
Spare Time
  • The Spare Time indicates the maximum time that a
    node may delay, without affecting the start time
    of any of its immediate successor nodes.
  • A node i with an immediate successor j on the
    DAG spare(i,j) Start_Time(j)
    Data_Arrival_Time(i,j)
  • A node i with an immediate successor j on the
    same machine spare(i,j) Start_Time(j)
    Finish_Time(i)
  • The minimum of the above for all immediate
    successors is the Spare Time of a task.
  • Distributing the Application Spare Time needs to
    take care of the inherently present spare time!

62
Two main strategies
  • Recursive spare time allocation
  • The Application Spare Time is divided among all
    the tasks.
  • This is a repetitive process until the
    Application Spare Time is below a threshold.
  • Critical Path based allocation
  • Divide the Application Spare Time among the tasks
    in the critical path.
  • Balance the Spare Time of all the other tasks.
  • (a total of 6 variants have been studied)

63
An Example
64
Critical Path based allocation
65
Finally
66
Findings
  • Advance reservation of resources for workflows
    can be automatically converted to reservations at
    the task level, thus improving resource
    utilization.
  • If the deadline set for the DAG is such that
    there is enough spare time, then we can reserve
    resources for each individual task so that
    deviations of the same order, for each task, can
    be afforded without any problems.
  • Advance reservation is known to harm resource
    utilization. But this study indicated that if the
    user is prepared to pay for full usage when only
    60 of the slot is used there is no loss for the
    machine owner.

67
which leads to pricing!
  • R.Sakellariou, H.Zhao, E.Tsiakkouri, M.Dikaiakos.
    Scheduling workflows under budget constraints.
    To appear as a Chapter in a book with selected
    papers from the 1st CoreGrid Integration
    Workshop.
  • The idea
  • Given a specific budget, what is the best
    schedule you can obtain for your workflow?
  • Multicriteria optimisation is hard!
  • Our approach
  • Start from a good solution for one objective, and
    try to meet the other!
  • It works! How well difficult to tell!

68
To summarize
  • Understanding the basic static scenarios and
    having robust solutions for those scenarios helps
    the extension to more complex cases
  • Pretty much everything here is addressed by
    heuristics. Their evaluation requires extensive
    experimentation Still
  • No agreement about how DAGs (workflows) look
    like.
  • No agreement about how heterogeneous resources
    are
  • The problems addressed here are perhaps more
    related to what is supposed to be core CS
  • But we may be talking about lots of work for
    only incremental improvements 10-15

69
  • Who cares in Computer Science about performance
    improvements in the order of 10-15???
  • (yet, if Gordon Brown was to increase our taxes
    by 10-15 everyone would be so unhappy ?)
  • Oh well

70
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com