Application of Methods of Queuing Theory to Scheduling in GRID - PowerPoint PPT Presentation

About This Presentation

Title:

Application of Methods of Queuing Theory to Scheduling in GRID

Description:

Application of Methods of Queuing Theory to Scheduling in GRID A Queuing Theory-based mathematical model is presented, and an explicit form of the optimal control ... – PowerPoint PPT presentation

Number of Views:247

Avg rating:3.0/5.0

Slides: 28

Provided by: arts60

Category:

more less

Transcript and Presenter's Notes

Title: Application of Methods of Queuing Theory to Scheduling in GRID

1
Application of Methods of Queuing Theory to
Scheduling in GRID

A Queuing Theory-based mathematical model is
presented, and an explicit form of the optimal
control procedure obtained as the solution to
the problem of maximizing the system throughput.

2
Why Queuing Theory?

Indeed, there are queues in real GRIDs
The services GRIDs offer to end users much
resemble the services offered by telephone
networks, the typical subject of study in Queuing
Theory
The complexity of the associated processes leaves
little options but to use the probabilistic
techniques

3
Complexity The Principal Limiting Factor to
Modeling

GRIDs are very complicated systems themselves
GRIDs are composed of smaller complicated systems
Computer hardware
Networks
Software
GRIDs are embedded into the larger complicated
systems
Scientific organizations
RD activities
Globalization processes

4
Stopping Decomposition as Soon as Possible to
Avoid Unnecessary Complexity

Demarcate the phenomena specific to scheduling in
GRID, and the generic phenomena
Model complicated behavior of the components with
probabilistic techniques
Find the most general expression of the effects

5
Ultimate Stopper of Decomposition

No entity in the modeled system should be
decomposed, if the system persists when that
entity is replaced with another similar one.

6
Implications

There is no need to develop detailed models of
computers, networks, software or interaction
external to GRID
There is no need to model the intra-GRID
interaction, which does not directly affect
scheduling
Information about how long it will take to
process a demand on each node is all we need to
know about the demand.

7
Mathematical Concepts Involved

Probability
Poisson Process
Multivariate Distribution
Linear Programming
Convergence By Law

8
Simplified Model

There is a finite number of classes of demands
(all demands from the same class have equal
complexity)
Sub-Model of Structure
Set of N nodes with queues
Sub-Model of Flow of Demands
Poisson process of arrivals with intensity ?
M classes of demands
Sub-Model of Scheduling Procedure
Recognizes distinct classes of demands and routes
the demands to the nodes it chooses

9
Sub-Model Structure
10
Sub-Model Flow of Demands

Demands from class j arrive with intensity ?j
?pj (?1 ?m ?)
Upon arrival, a demand from class j is routed to
node i with probability si,j
A demand from class j requires ?i,j units of
processing time, if routed to node i
The computing time is incompressible
processing two demands with complexities T1 and
T2 at a particular node requires T1T2 time units
independently of the order (or level of
parallelism) in which they are processed

11
Two Important Facts About Poisson Processes

Let X1 and X2 be independent Poisson processes
with intensity ?1 and ?2.Then X1 X2 is a Poisson
process with intensity ?1 ?2.
Suppose a Poisson process X with intensity ? is
split into X1 and X2. With probability p events
are passed to X1 and otherwise to X2. Then X1 and
X2 are Poisson processes with intensities p? and
(1-p)?.

12
Flow of Demands Scheduling Procedure
13
Sub-Model Scheduling Procedure

The GRID operates in a stable environment
Routing of any demand in each moment depends on
the current state of the system only
For all nodes load ?ilt1
?
The system can operate in the stationary mode
The stationary mode is stable

14
Stationary Mode
15
Implications of Stationary Operation

Incoming demands of class j are routed to node i
with stationary probability si,j
Load of node i has the form
?i ? ? si,j ?i,j pj lt 1

16
Optimization Problem
17
Linear Programming

It is possible to rewrite the constraints in the
folowing form
?i ? si,j ?i,j pj
?i ? ?
??min
Now it is an LP problem

18
From Simplified to Real-World Model

How to handle non-discrete distributions of
demands?
How to handle errors in classification (imperfect
information)?
What about non-stationary modes?
Short-term excesses are not fatal because of
stability
Long-term changes in distribution of demands can
render the S.P. non-optimal

19
Approximating Actual Distribution of Demands with
A Discrete Distribution
20
A Better Approximation
21
What Happens When M???

Simplified
s is a matrix
s NxM?0,1
? NxM?0,?)
?i ? ? si,j ?i,j pj

Marginal
s is a function
si RM?0,1
? multivariate random value (in RM )
?i ?E ?isi(?)

22
Handling Imperfect Information

Average values of ?i,j can be used
The scheduling procedure should be iteratively
re-evaluated when more information becomes
available
In the real world applications, the exact
distribution of demands is unknown, but can be
approximated from the history of the system
operation

23
A Comparison

Let ? be an exponentially distributed random
value with average 1
?i,j 1?
Trivial procedure distributes demands with equal
probability to any node
An optimized procedure is obtained as shown

24
Scheduling Trivial vs. Optimized
Maximum Throughput
Optimized
Trivial
Num. of Nodes
25
Conclusions

The exact upper bound of throughput for a given
GRID can be estimated
A scheduling procedure which achieves this limit
can be constructed from a solution of an LP
problem
The optimal scheduling procedure should be
non-deterministic
Trivial and deterministic schedulers are
generally unlikely to achieve the theoretical
limit

26
References

L. Kleinrock, Queueing Systems, 1976
Andrei Dorokhov, Simulation simple models and
comparison with queueing theory
http//csdl.computer.org/comp/proceedings/hpdc/200
3/1965/00/19650034abs.htm
Atsuko Takefusa, Osamu Tatebe, Satoshi Matsuoka,
Youhei Morita, Performance Analysis of
Scheduling and Replication Algorithms on Grid
Datafarm Architecture for High-Energy Physics
Applications
GNU Linear Programming Kit, http//www.fsf.org

27
My Special Thanks To