Information and Scheduling: What's available and how does it change

About This Presentation

Title:

Information and Scheduling: What's available and how does it change

Description:

How a scheduler work is closely tied to the information available ... Complexity of global snapshot. Components will fail. Scalability and overhead ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 27

Provided by: jennife62

Category:

more less

Transcript and Presenter's Notes

Title: Information and Scheduling: What's available and how does it change

1
Information and Scheduling What's available and
how does it change

Jennifer M. Schopf
Argonne National Lab

2
Information and Scheduling

How a scheduler work is closely tied to the
information available
Choice of algorithm dependent on accessible data

3
This Talk

What approaches expect form information
What data is actually available, and some open
questions
How data changes
What to do about changing data

4
NB

Im speaking (pessimistically) from my own
background
Weve heard some talks earlier today (for example
PACE) which address some of these problems
I still think these are interesting open issues
to think about

5
Information systems(NOTE taken from my standard
MDS2 talk)

Information is always old
Time of flight, changing system state
Need to provide quality metrics
Distributed system state is hard to obtain
Information is not contemporaneous (thanks j.g.)
Complexity of global snapshot
Components will fail
Scalability and overhead
Approaches are changed for scalability, this will
affect the information available

6
Scheduling approaches assume

A lot of data is available
All information is accurate
Values dont change

7
Example System data

1. The bandwidth bij the maximum data rate in
bits per second.
2. The flow fij the effective data rate in bits
per second on the link.
3. The utilization uij the utilization is
represented as the ratio of the effective flow to
bandwidth, uij fij / bij
4. The length lij the Euclidean distance
between its end peers.
5. The cost Cij the cost can be defined as a
function of the link length and its bandwidth,
Cij S(lij/bij), where S is a constant value.
6. Ti the processor speed of the peer, which is
the number of work units that the peer can
execute per unit of time.
etc.

8
Example Application information

1. Bi is the number of work units (in terms of
computations) in the task. So, the number of time
units that the task ti needs in order to be
executed on peer vk are Bi/Tk
2. ui, is the number of packets required to
transfer the task. Thus, the task ti needs
uiw/bij work units to be transferred from peer vi
to the peer vj , assuming that these two peers
are direct neighbors and the condition of the
network is ideal.
3. Implicit exact mapping of tasks and data in a
DAG
etc

9
What some people expect

Perfect bandwidth info
Number of operations in an application
Scalar value of computer power
Mapping of power to applications
Perfect load information

10
Bandwidth data

Network Weather Service (Wolski, UCSB)
64k probe BW data
Latency data
Predictions
Pinger (Les Cotrell, SLAC)
Create long term baselines for expectations on
means/medians and variability for response time,
throughput, packet loss
Predicting TCP performance
Allen Downey
http//allendowney.com/research/tcp/
But what do Grid applications need?

11
Perfect Bandwidth Data

64 k probes dont look like large file transfers

12
Predicting Large File Transfers

Vazhkudai and Schopf use GridFTP logs and some
background data - NWS, ioStat (HPDC 2002)
Error rate of 15
M. Faerman A. Su, R. Wolski, and F. Berman (HPDC
99)
Similar results for SARA data
Hu and Schopf use an AI learning technique on
GridFTP log files only (not published yet)
Picks best place to get a file from 60-80 of
time, using averages only gives you 50 best
chosen
This topic needs much more study!

13
Data GenerallyAvailable From an Application

What some scheduling approaches want
Number of ops in an application
Exact execution time on a platform
Perfect models of applications

14
Application DataCurrently Available

Bad models of applications
No models of applications
Some work (Propehsy, Taylor at Texas AM) does
logging to create models
Many interesting applications have
non-deterministic run times
User estimates of application run time
(historically) off by 20
We need to be able to figure out ways to do
predictions of application run times WITHOUT
models

15
Scalar value of computer power

MDS2 gives me
CPU vendor, model and version
CPU speed
OS name, release and version
RAM size
Node count
CPU count
Where is compute power in this data?

16
What is compute power

I could get benchmark data, but whats the right
benchmark(s) to use?
Computer power simply isnt scalar, especially
in a Grid environment
Goal is really to understand how an application
will run on a machine
Given three different benchmarks, 3 different
platforms will perform very differently one
best on BM1, another best on BM2

17
Mapping power to applications

Many scheduling approaches assume power is a
scalar just multiply it by the set application
time and were set
Only problem
Power isnt a scalar
No one knows absolute application run times
Mapping will NOT be straight forward
We need a way to estimate application time on a
contended system

18
Perfect Load Information

MDS2 gives me
Basic queue data
Host load 5/10/15 min avg
Last value only

19
Load Predictions

Network weather service
12 prediction techniques
Work on any time series
Expect regularly arriving data
Only a prediction of the next value
I want to know what load is going to be like in
20 mins
Or the AVERAGE over the next 20 mins?

20
Information and Scheduling

What approaches expect us to have
What we actually have access to
How it changes
What to do about changing data

21
Dedicated SOR Experiments

Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
10 mbit ethernet connection
Quiescent machines and network
Prediction within 3 before memory spill

22
Non-dedicated SOR results

Available CPU on workstations varied from .43 to
.53

23
SOR with Higher Variancein CPU Availability
24
Improving predictions

Available CPU has range of 0.48 /- 0.05
Prediction should also have a range

25
Scheduling needsto consider variance

Conservative Scheduling Using Predicted Variance
to Improve Scheduling Decisions in Dynamic
Environments
Lingyun Yang, Jennifer M. Schopf, Ian Foster
To appear at SC'03, November 15-21, 2003,
Phoenix, Arizona, USA
www.mcs.anl.gov/jms/Pubs/lingyun-SC-scheduling.pd
f

26
Scheduling with Variance

Summary Scheduling with variance can give better
mean performance and less variance in overall
execution time

27
Lessons

We need work predicting large file transfers
NOT bandwidth
We need to be able to figure out ways to do
predictions of application run times WITHOUT
models
We need predictions over time periods not just
a next value
We need a way to represent power of a machine,
that takes variance into account
We need a way to map power to application
behavior
We need better scheduling approaches that take
variance into account

28
Contact Information

Jennifer M. Schopf
jms_at_mcs.anl.gov
www.mcs.anl.gov/jms
Links to some of the publications mentioned
Links to the co-edited book Grid resource
Management State of the Art and Future Trends

Write a Comment

User Comments (0)

About PowerShow.com

Information and Scheduling: What's available and how does it change - PowerPoint PPT Presentation

Information and Scheduling: What's available and how does it change

How a scheduler work is closely tied to the information available ... Complexity of global snapshot. Components will fail. Scalability and overhead ... – PowerPoint PPT presentation