Workflow Scheduling Optimisation: The case for revisiting DAG scheduling

About This Presentation

Title:

Workflow Scheduling Optimisation: The case for revisiting DAG scheduling

Description:

A workflow is a Directed Acyclic Graph (DAG) ... Static DAG scheduling onto heterogeneous ... A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. ... – PowerPoint PPT presentation

Number of Views:253

Avg rating:3.0/5.0

Slides: 71

Provided by: scie205

Category:

more less

Transcript and Presenter's Notes

Title: Workflow Scheduling Optimisation: The case for revisiting DAG scheduling

1
Workflow Scheduling OptimisationThe case for
revisiting DAG scheduling

Rizos Sakellariou and Henan Zhao
University of Manchester

2
Scheduling
Ewa Deelman, deelman_at_isi.edu www.isi.edu/deelma
n pegasus.isi.edu
Slide Courtesy Ewa Deelman, deelman_at_isi.edu www
.isi.edu/deelman pegasus.isi.edu
3
Execution Environment
Slide Courtesy Ewa Deelman, deelman_at_isi.edu www
.isi.edu/deelman pegasus.isi.edu
4
In this talk, optimisation relates to
performanceWhat affects performance?

Aim to minimise the execution time of the
workflow
How?
Exploit task parallelism
But, even if there is enough parallelism, can the
environment guarantee that this parallelism can
be exploited to improve performance?
No!
Why?
There is interference from the batch job
schedulers that are traditionally used to submit
jobs to HPC resources!

5
Example

The uncertainty of batch schedulers means that
any workflow enactment engine must wait for
components to complete before beginning to
schedule dependent components.
Furthermore, it is not clear if parallelism will
be fully exploited e.g., if the three tasks
above that can be executed in parallel are
submitted to 3 different queues of different
length, there is no guarantee that they will
execute in parallel job queues rule!

This execution model fails to hide the latencies
resulting from the length of job queues these
determine the execution time of the workflows.
6
Then try to get rid of the evil job queues!

Advance reservation of resources has been
proposed to make jobs run at a precise time.
However, resources would be wasted if they are
reserved for the whole execution of the workflow.
Can we automatically make advance reservations
for individual tasks?

7
Assuming that there is no job queue

what affects performance?
The structure of the workflow
number of parallel tasks
how long these tasks take to execute
The number of the resources
typically, much smaller than the parallelism
available.
In addition
there are communication costs
there is heterogeneity
estimating computationcommunication is not
trivial.

8
What does all this imply for mapping?

An order by which tasks will be executed needs to
be established (eg., red, yellow, or blue first?)
Resources need to be chosen for each task (some
resources are fast, some are not so fast!)
The cost of moving data between resources should
not outweigh the benefits of parallelism.

9
Does the order matter?

If task 6 on the right takes comparatively longer
to run, wed like to execute task 2 just after
task 0 finishes and before tasks 1, 3, 4, 5.

Follow the critical path! Is this new? Not
really ?
10
Modelling the problem

A workflow is a Directed Acyclic Graph (DAG)
Scheduling DAGs onto resources is well studied in
the context of homogeneous systems less so, in
the context of heterogeneous systems (mostly
without taking into account any uncertainty).
Needless to say that this is an NP-complete
problem.
Are workflows really a general type of DAGs or a
subclass? We dont really know (some are clearly
not DAGs only DAGs considered here)

11
Our approach

Revisit the DAG scheduling problem for
heterogeneous systems
Start with simple static scenarios
Even this problem is not well understood, despite
the fact that there have been perhaps more than
30 heuristics published (check the Heterogeneous
Computing Workshop proceedings for a start)
Try to build on as we obtain a good understanding
of each step!

12
Outline

Static DAG scheduling onto heterogeneous systems
(ie, we know computation communication a
priori)
Introduce uncertainty in computation times.
Handle multiple DAGs at the same time.
Use the knowledge accumulated above to reserve
slots for tasks onto resources.

13
Based on

1 Rizos Sakellariou, Henan Zhao. A Hybrid
Heuristic for DAG Scheduling on Heterogeneous
Systems. Proceedings of the 13th IEEE
Heterogeneous Computing Workshop (HCW04) (in
conjunction with IPDPS 2004), Santa Fe, April
2006, IEEE Computer Society Press, 2004.
2 Rizos Sakellariou, Henan Zhao. A low-cost
rescheduling policy for efficient mapping of
workflows on grid systems. Scientific
Programming, 12(4), December 2004, pp. 253-262.
3 Henan Zhao, Rizos Sakellariou. Scheduling
Multiple DAGs onto Heterogeneous Systems.
Proceedings of the 15th Heterogeneous Computing
Workshop (HCW'06) (in conjunction with IPDPS
2006), Rhodes, Apr. 2006, IEEE Computer Society
Press.
4 Henan Zhao, Rizos Sakellariou. Advance
Reservation Policies for Workflows. Proceedings
of the 12th Workshop on Job Scheduling Strategies
for Parallel Processing, 2006.

14
How to schedule? Our modelA DAG, 10 tasks, 3
machines(assume we know execution times,
communication costs)
15
A simple idea

Assign nodes to the fastest machine!

Communication between nodes 4 and 8 takes way
too long!!!
Heuristics that take into account the whole
structure of the DAG are needed
Makespan is gt 1000!
16
H.Zhao,R.Sakellariou. An experimental study of
the rank function of HEFT. Proceedings of
EuroPar03.
17
Hmm

This was a rather well defined problem
This was just a small change in the algorithm
What about different heuristics?
What about more generic problems?

18
DAG scheduling A Hybrid Heuristic

Trying to find out why there were such
differences in the outcome of HEFTwe observed
problems with the order to address those
problems we came up with a Hybrid Heuristic it
worked quite well!
Phases
Rank (list scheduling)
Create groups of independent tasks
Schedule independent tasks
Can be carried out using any scheduling algorithm
for independent tasks, e.g. MinMin, MaxMin,
A novel heuristic (Balanced Minimum Completion
Time)

R.Sakellariou, H.Zhao. A Hybrid Heuristic for DAG
Scheduling on Heterogeneous Systems. Proceedings
of the IEEE Heterogeneous Computing Workshop
(HCW 04) , 2004.
19
An Example
14
18 22 13 25 15
14 21
17
26 20 26
20 19

20

Mean
Upward Ranking Scheme
The order is
0, 1, 4, 5, 7, 2, 3, 6, 8, 9

21
An Example

Phase 1 Rank the nodes
Phase 2 Create groups of independent tasks
The order is 0, 1, 4, 5, 7, 2, 3, 6, 8, 9

22
Balanced Minimum Completion TimeAlgorithm (BMCT)

Step I
Assign each task to the machine that gives the
fastest execution time.
Step II
Find the machine M with the maximal finish time.
Move a task from M to another machine, if it
minimizes the overall makespan.

Initially assign each task in the group to the
machine giving the fastest time
No movement for the entry task

24
An Example (2)

Phase 3 Schedule Independent Tasks in Group 1
M0
M1 M2
0
20
40
60
80
100
120
140

Initially assign each task in the group to the
machine giving the fastest time

Initially assign each task in the group to the
machine giving the fastest time
M2 is the machine with the Maximal Finish Time
(70)

Task 5 moves to M0 since it can achieve an
earlier overall finish time
Now M0 is the machine with the Maximal Finish
Time (69)

Task 1 moves to M2 since it can achieve an
earlier overall finish time
Now M2 is the machine with the Maximal Finish
Time (59)
No task can be moved from M2, the movement stops.
Schedule next group

Initially assign each task in this group to the
machine giving the fastest time

Task 2 moves to M1 since it can achieve an
earlier overall finish time
M2 is the machine with the Maximal Finish Time
No movement from M2
Schedule next group

Initially assign each task in this group to the
machine giving the fastest time

Task 6 moves to M0 since it can achieve an
earlier overall finish time
M2 is the machine with the Maximal Finish Time

Task 8 moves to M1 since it can achieve an
earlier overall finish time
M1 is the machine with the Maximal Finish Time
No movement from M1
Schedule next group

Initially assign each task in this group to the
machine giving the fastest time
No movement for the exit task

34
(No Transcript)
35
Experiments

DAG Scheduling
Algorithms
Hybrid.BMCT (i.e. The algorithm as presented),
and
Hybrid.MinMin (i.e. MinMin instead of BMCT)
Applications
Random-generated graphs
Laplace
FFT
Fork-join graphs
Heterogeneity setting (following an approach by
Siegel et al)
Consistent
Partially-consistent
Inconsistent

36
Hybrid Heuristic Comparison

NSL
Random DAGs, 25-100 tasks with inconsistent
heterogeneity
Average improvement 25

37
Hmm

Yes, but, so far, you have used static task
execution times in practice such times are
difficult to specify exactly
There is an answer for run-time deviations
adjust at run-time
But
dont we need to understand the static case
first?

38
Characterise the Schedule

Spare time indicates the maximum time that a
node, i, may delay without affecting the start
time of an immediate successor, j.
A node i with an immediate successor j on the
DAG spare(i,j) Start_Time(j)
Data_Arrival_Time(i,j)
A node i with an immediate successor j on the
same machine spare(i,j) Start_Time(j)
Finish_Time(i)
The minimum of the above MinSpare for a task.

R.Sakellariou, H.Zhao. A low-cost rescheduling
policy for efficient mapping of workflows on
grid systems. Scientific Programming, 12(4),
December 2004, pp. 253-262.
39
Example

DAT(4,7)40.5, ST(7)45.5, hence, spare(4,7) 5
FT(3)28, ST(5)29.5, hence, spare(3,5) 1.5
DAT Data_Arrival_Time, ST Start_Time, FT
Finish_Time

40
Characterise the schedule (cont.)

Slack indicates the maximum time that a node, i,
may delay without affecting the overall makespan.
Slack(i)min(slack(j)spare(i,j)), for all
successor nodes j (both on the DAG and the
machine)
The idea keep track of the values of the slack
and/or the spare time and reschedule only when
the delay exceeds slack

R.Sakellariou, H.Zhao. A low-cost rescheduling
policy for efficient mapping of workflows on
grid systems. Scientific Programming, 12(4),
December 2004, pp. 253-262.
41
Lessons Learned(simulation and deviations of up
to 100)

Heuristics that perform better statically,
perform better under uncertainties.
By using the metrics on spare time, one can
provide guarantees for the maximum deviation from
the static estimate. Then, we can minimise the
number of times we reschedule still achieving
good results.
Could lead to orders of magnitude improvement
with respect to workflow execution using DAGMAN
(would depend on the workflow, only partly true
with Montage)

42
Challenges still unanswered

What are the representative DAGs (workflows) in
the context of Grid computing?
Extensive evaluation / analysis (theoretical too)
is needed. Not clear what is the best makespan we
can get (because it is not easy to find the
critical path)
What are the uncertainties involved? How good are
the estimates that we can obtain for the
execution time / communication cost? Performance
prediction is hard
How heterogeneous our Grid resources really are?

43
Moving on to multiple DAGs

It is really ideal to assume that we have
exclusive usage of resources
In practice, we may have multiple DAGs competing
for resources at the same time

Henan Zhao, Rizos Sakellariou. Scheduling
Multiple DAGs onto Heterogeneous Systems.
Proceedings of the 15th Heterogeneous Computing
Workshop (HCW'06) (in conjunction with IPDPS
2006), Rhodes, Apr. 2006, IEEE Computer Society
Press.
44
Scheduling Multiple DAGsApproaches

Approach 1 Schedule one DAG after the other with
existing DAG scheduling algorithms
Low resource utilization long overall makespan
Approach 2 Still one after the other, but do
some backfilling and fill the gaps
Which DAG to schedule first? The one with
longest makespan or the one with shortest
makespan?
Approach 3 Merge all DAGs into a single,
composite DAG. Much better than Approach 1 or 2.

45
Example Two DAGs to be scheduled together

DAG A DAG B

46
Composition Techniques

C1 Common Entry and Common Exit Node

47
Composition Techniques

C2 Level-Based Ordering

48
Composition Techniques

C3 Alternate between DAGs (round robin between
DAGs)
Easy!

49
Composition Techniques

C4 Ranking-Based Composition (compute a weight
for each node and merge accordingly)

A0 B0
50

But, is makespan optimisation a good objective
when scheduling multiple DAGs?

51
Mission Fairness

In multiple DAGs
Users perspective I want my DAG to complete
execution as soon as possible.
System perspective I would like to keep as many
users as possible happy I would like to increase
resource utilisation.
Lets be fair to users!
(The system may want to take into account
different levels of quality of service agreed
with each user)

52
Slowdown

Slowdown what is the delay that a DAG would
experience as a result of sharing the resources
with other DAGs (as opposed to having the
resources on its own).
Average slowdown for all DAGs

53
Unfairness

Unfairness indicates, for all DAGs, how different
the slowdown of each DAG is from the average
slowdown (over all DAGs).
The higher the difference, the higher the
unfairness!

54
Scheduling for Fairness

Key idea at each step (that is, every time a
task is to be scheduled), select the most
affected DAG (that is the DAG with the highest
slowdown value) to schedule.
What is the most affected DAG at any given point
in time?

55
Fairness Scheduling Policies

F1 Based on latest Finish Time
calculates the slowdown value only at the time
the last task that was scheduled for this DAG
finishes.
F2 Based on Current Time
re-calculates the slowdown value for every DAG
after any task finishes. A proportion of time,
for tasks running, is taken when the calculation
is carried out.

56
Lessons Learned Open questions

It is possible to achieve reasonably good
fairness without affecting makespan.
An algorithm with good behaviour in the static
case appears to make things easier in terms of
achieving fairness
What is fairness?
What is the behavior when run-time changes occur?
What about different notions of Quality of
Service (SLAs, etc)

57
Finally

How to automate advance reservations at the task
level for a workflow, when the user has specified
a deadline constraint only for the whole workflow?

Henan Zhao, Rizos Sakellariou. Advance
Reservation Policies for Workflows. Proceedings
of the 12th Workshop on Job Scheduling
Strategies for Parallel Processing, 2006.
58
The schedule on the left can be used to plan
reservations. However, if one task fails to
finish within its slot, the remaining tasks have
to be re-negotiated.
59
What we are looking for is
60
The Idea

1. Obtain an initial assignment using any DAG
scheduling algorithm (HEFT, HBMCT, ).
2. Repeat
I. Compute the Application Spare Time ( user
specified deadline DAG finish time).
II. Distribute the Application Spare Time among
the tasks.
3. Until the Application Spare Time is below a
threshold.

61
Spare Time

The Spare Time indicates the maximum time that a
node may delay, without affecting the start time
of any of its immediate successor nodes.
A node i with an immediate successor j on the
DAG spare(i,j) Start_Time(j)
Data_Arrival_Time(i,j)
A node i with an immediate successor j on the
same machine spare(i,j) Start_Time(j)
Finish_Time(i)
The minimum of the above for all immediate
successors is the Spare Time of a task.
Distributing the Application Spare Time needs to
take care of the inherently present spare time!

62
Two main strategies

Recursive spare time allocation
The Application Spare Time is divided among all
the tasks.
This is a repetitive process until the
Application Spare Time is below a threshold.
Critical Path based allocation
Divide the Application Spare Time among the tasks
in the critical path.
Balance the Spare Time of all the other tasks.
(a total of 6 variants have been studied)

63
An Example
64
Critical Path based allocation
65
Finally
66
Findings

Advance reservation of resources for workflows
can be automatically converted to reservations at
the task level, thus improving resource
utilization.
If the deadline set for the DAG is such that
there is enough spare time, then we can reserve
resources for each individual task so that
deviations of the same order, for each task, can
be afforded without any problems.
Advance reservation is known to harm resource
utilization. But this study indicated that if the
user is prepared to pay for full usage when only
60 of the slot is used there is no loss for the
machine owner.

67
which leads to pricing!

R.Sakellariou, H.Zhao, E.Tsiakkouri, M.Dikaiakos.
Scheduling workflows under budget constraints.
To appear as a Chapter in a book with selected
papers from the 1st CoreGrid Integration
Workshop.
The idea
Given a specific budget, what is the best
schedule you can obtain for your workflow?
Multicriteria optimisation is hard!
Our approach
Start from a good solution for one objective, and
try to meet the other!
It works! How well difficult to tell!

68
To summarize

Understanding the basic static scenarios and
having robust solutions for those scenarios helps
the extension to more complex cases
Pretty much everything here is addressed by
heuristics. Their evaluation requires extensive
experimentation Still
No agreement about how DAGs (workflows) look
like.
No agreement about how heterogeneous resources
are
The problems addressed here are perhaps more
related to what is supposed to be core CS
But we may be talking about lots of work for
only incremental improvements 10-15

Who cares in Computer Science about performance
improvements in the order of 10-15???
(yet, if Gordon Brown was to increase our taxes
by 10-15 everyone would be so unhappy ?)
Oh well

70
Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Workflow Scheduling Optimisation: The case for revisiting DAG scheduling - PowerPoint PPT Presentation

Workflow Scheduling Optimisation: The case for revisiting DAG scheduling

A workflow is a Directed Acyclic Graph (DAG) ... Static DAG scheduling onto heterogeneous ... A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. ... – PowerPoint PPT presentation