Information and Scheduling: What's available and how does it change - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Information and Scheduling: What's available and how does it change

Description:

How a scheduler work is closely tied to the information available ... Complexity of global snapshot. Components will fail. Scalability and overhead ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 27
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Information and Scheduling: What's available and how does it change


1
Information and Scheduling What's available and
how does it change
  • Jennifer M. Schopf
  • Argonne National Lab

2
Information and Scheduling
  • How a scheduler work is closely tied to the
    information available
  • Choice of algorithm dependent on accessible data

3
This Talk
  • What approaches expect form information
  • What data is actually available, and some open
    questions
  • How data changes
  • What to do about changing data

4
NB
  • Im speaking (pessimistically) from my own
    background
  • Weve heard some talks earlier today (for example
    PACE) which address some of these problems
  • I still think these are interesting open issues
    to think about

5
Information systems(NOTE taken from my standard
MDS2 talk)
  • Information is always old
  • Time of flight, changing system state
  • Need to provide quality metrics
  • Distributed system state is hard to obtain
  • Information is not contemporaneous (thanks j.g.)
  • Complexity of global snapshot
  • Components will fail
  • Scalability and overhead
  • Approaches are changed for scalability, this will
    affect the information available

6
Scheduling approaches assume
  • A lot of data is available
  • All information is accurate
  • Values dont change

7
Example System data
  • 1. The bandwidth bij the maximum data rate in
    bits per second.
  • 2. The flow fij the effective data rate in bits
    per second on the link.
  • 3. The utilization uij the utilization is
    represented as the ratio of the effective flow to
    bandwidth, uij fij / bij
  • 4. The length lij the Euclidean distance
    between its end peers.
  • 5. The cost Cij the cost can be defined as a
    function of the link length and its bandwidth,
    Cij S(lij/bij), where S is a constant value.
  • 6. Ti the processor speed of the peer, which is
    the number of work units that the peer can
    execute per unit of time.
  • etc.

8
Example Application information
  • 1. Bi is the number of work units (in terms of
    computations) in the task. So, the number of time
    units that the task ti needs in order to be
    executed on peer vk are Bi/Tk
  • 2. ui, is the number of packets required to
    transfer the task. Thus, the task ti needs
    uiw/bij work units to be transferred from peer vi
    to the peer vj , assuming that these two peers
    are direct neighbors and the condition of the
    network is ideal.
  • 3. Implicit exact mapping of tasks and data in a
    DAG
  • etc

9
What some people expect
  • Perfect bandwidth info
  • Number of operations in an application
  • Scalar value of computer power
  • Mapping of power to applications
  • Perfect load information

10
Bandwidth data
  • Network Weather Service (Wolski, UCSB)
  • 64k probe BW data
  • Latency data
  • Predictions
  • Pinger (Les Cotrell, SLAC)
  • Create long term baselines for expectations on
    means/medians and variability for response time,
    throughput, packet loss
  • Predicting TCP performance
  • Allen Downey
  • http//allendowney.com/research/tcp/
  • But what do Grid applications need?

11
Perfect Bandwidth Data
  • 64 k probes dont look like large file transfers

12
Predicting Large File Transfers
  • Vazhkudai and Schopf use GridFTP logs and some
    background data - NWS, ioStat (HPDC 2002)
  • Error rate of 15
  • M. Faerman A. Su, R. Wolski, and F. Berman (HPDC
    99)
  • Similar results for SARA data
  • Hu and Schopf use an AI learning technique on
    GridFTP log files only (not published yet)
  • Picks best place to get a file from 60-80 of
    time, using averages only gives you 50 best
    chosen
  • This topic needs much more study!

13
Data GenerallyAvailable From an Application
  • What some scheduling approaches want
  • Number of ops in an application
  • Exact execution time on a platform
  • Perfect models of applications

14
Application DataCurrently Available
  • Bad models of applications
  • No models of applications
  • Some work (Propehsy, Taylor at Texas AM) does
    logging to create models
  • Many interesting applications have
    non-deterministic run times
  • User estimates of application run time
    (historically) off by 20
  • We need to be able to figure out ways to do
    predictions of application run times WITHOUT
    models

15
Scalar value of computer power
  • MDS2 gives me
  • CPU vendor, model and version
  • CPU speed
  • OS name, release and version
  • RAM size
  • Node count
  • CPU count
  • Where is compute power in this data?

16
What is compute power
  • I could get benchmark data, but whats the right
    benchmark(s) to use?
  • Computer power simply isnt scalar, especially
    in a Grid environment
  • Goal is really to understand how an application
    will run on a machine
  • Given three different benchmarks, 3 different
    platforms will perform very differently one
    best on BM1, another best on BM2

17
Mapping power to applications
  • Many scheduling approaches assume power is a
    scalar just multiply it by the set application
    time and were set
  • Only problem
  • Power isnt a scalar
  • No one knows absolute application run times
  • Mapping will NOT be straight forward
  • We need a way to estimate application time on a
    contended system

18
Perfect Load Information
  • MDS2 gives me
  • Basic queue data
  • Host load 5/10/15 min avg
  • Last value only

19
Load Predictions
  • Network weather service
  • 12 prediction techniques
  • Work on any time series
  • Expect regularly arriving data
  • Only a prediction of the next value
  • I want to know what load is going to be like in
    20 mins
  • Or the AVERAGE over the next 20 mins?

20
Information and Scheduling
  • What approaches expect us to have
  • What we actually have access to
  • How it changes
  • What to do about changing data

21
Dedicated SOR Experiments
  • Platform- 2 Sparc 2s. 1 Sparc 5, 1 Sparc 10
  • 10 mbit ethernet connection
  • Quiescent machines and network
  • Prediction within 3 before memory spill

22
Non-dedicated SOR results
  • Available CPU on workstations varied from .43 to
    .53

23
SOR with Higher Variancein CPU Availability
24
Improving predictions
  • Available CPU has range of 0.48 /- 0.05
  • Prediction should also have a range

25
Scheduling needsto consider variance
  • Conservative Scheduling Using Predicted Variance
    to Improve Scheduling Decisions in Dynamic
    Environments
  • Lingyun Yang, Jennifer M. Schopf, Ian Foster
  • To appear at SC'03, November 15-21, 2003,
    Phoenix, Arizona, USA
  • www.mcs.anl.gov/jms/Pubs/lingyun-SC-scheduling.pd
    f

26
Scheduling with Variance
  • Summary Scheduling with variance can give better
    mean performance and less variance in overall
    execution time

27
Lessons
  • We need work predicting large file transfers
    NOT bandwidth
  • We need to be able to figure out ways to do
    predictions of application run times WITHOUT
    models
  • We need predictions over time periods not just
    a next value
  • We need a way to represent power of a machine,
    that takes variance into account
  • We need a way to map power to application
    behavior
  • We need better scheduling approaches that take
    variance into account

28
Contact Information
  • Jennifer M. Schopf
  • jms_at_mcs.anl.gov
  • www.mcs.anl.gov/jms
  • Links to some of the publications mentioned
  • Links to the co-edited book Grid resource
    Management State of the Art and Future Trends
Write a Comment
User Comments (0)
About PowerShow.com