Designing Parallel Operating Systems using Modern Interconnects - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Designing Parallel Operating Systems using Modern Interconnects

Description:

Designing Parallel Operating Systems using Modern Interconnects ... Emulation: rationale, strengths, and weaknesses. Experimental results and analysis ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 38
Provided by: csHu
Category:

less

Transcript and Presenter's Notes

Title: Designing Parallel Operating Systems using Modern Interconnects


1
Designing Parallel Operating Systems using Modern
Interconnects
Toward Realistic Evaluation of Job Scheduling
Strategies
Eitan Frachtenberg With Dror Feitelson, Fabrizio
Petrini, and Juan Fernandez
Computer and Computational Sciences Division Los
Alamos National Laboratory
Ideas that change the world
2
Outline
  • The challenges of parallel job scheduling
    Evaluation
  • Emulation rationale, strengths, and weaknesses
  • Experimental results and analysis
  • How do different algorithms react to increasing
    load?
  • Can knowing the future help?
  • What is the effect of multiprogramming?
  • What applications is it good for?

3
Parallel Job Scheduling
  • The task assign compute resources to parallel
    jobs
  • The computers (clusters and MPPs)
  • Range from 100s of processors to 10,000 and more
  • Typically homogenous and connected by a fast
    interconnect
  • Jobs arrive dynamically, with different sizes and
    runtimes, requiring online scheduling
  • Mostly fine-grained communication, lots of memory
  • Mix of serial, parallel, short and long jobs

4
Scheduling Taxonomy
  • Rectangle packing
  • Main dimensions space sharing and time sharing
  • Additional queue dimension backfilling,
    priorities

5
Backfilling
  • Backfilling is a technique to move jobs forward
    in queue
  • Requires advanced knowledge of run time (or
    reservation)
  • Reduces external fragmentation and improves
    utilization, responsiveness and throughput
  • Has several variations

6
Time Sharing
  • Does not require reservation times, but can be
    combined with backfilling
  • Higher reduction of external fragmentation,
    possibly even internal fragmentation, resulting
    in improved utilization, responsiveness and
    throughput
  • But also challenging
  • Memory pressure
  • Context-switch overheads
  • Process synchronization tradeoffs
  • Tightly-coupled processes must be coscheduled
  • Coordination can incur overhead and fragmentation

7
none implicit
hybrid explicitLocal
DCS ICS/SB CC PB FCS BCS
GS
Time Sharing Spectrum
coordination
  • No coordination local UNIX scheduling
  • Explicit coordination
  • Global clock (centralized)
  • Global context-switches to known job
  • Implicit coordination infer synchronization
    information at server side, receiver side, or
    both
  • Hybrid global coordination with local autonomy

8
Without Timesharing
  • Short processes wait for long periods in the
    queue
  • External fragmentation creates many holes

9
Time Sharing - GS
  • Gang Scheduling multiprograms several jobs
  • Reduces response time and fills holes
  • Incurs more overhead and memory pressure (time
    quantum)

10
Time Sharing - SB
  • Spin Block (ICS) is a sender side coordination
    heuristic
  • Reduces overhead, increases scalability
  • Performs poorly with fine-grained communication

11
Time Sharing - FCS
  • Combine global synchronization local
    information
  • Rely on scalable primitives for global
    coordination and information exchange
  • Measure communication characteristics, such as
    granularity and wait times
  • Classify processes based on synchronization
    requirements
  • Schedule processes based on class
  • Preferential to short jobs

12
FCS Classification
Fine
Coarse
Granularity
DC Locally scheduled
Long
Short
Block times
CS Always gang-scheduled
F Preferably gang-scheduled
13
Evaluation Challenges
  • Theoretical Analysis (queuing theory)
  • Not applicable to time-sharing due to unknown
    parameters, application structure, and feedbacks
  • Simulation
  • Many assumptions, not all known/reported
  • Hard to reproduce many studies provide
    contradicting results, often showing theirs is
    best
  • Rarely factors application characteristics
  • Experiments with real sites and workloads
  • Largely impractical and irreproducible
  • Emulation

14
Emulation Methodology
  • Framework for studying scheduling algorithms
  • Runs any MPI application in a cluster
  • Implemented several scheduling algorithms
  • Allows control over input parameters
  • Provides detailed logs and analysis tools
  • Testing in a repeatable dynamic environment
  • Dynamic job arrivals, with varying time and space
    requirements
  • Complex, longer and more realistic workloads

15
Evaluation by Emulation
  • Pros
  • Real no hidden assumptions or overheads
  • Configurable choice of parameters and workloads
  • Repeatable same experiment, same results
  • Portable allows the isolation of HW factors
  • Cons
  • Slow
  • Requires more resources than analysis/simulation
  • GIGO results are only as representative as
    input

16
Experimental Environment
  • Implemented on top of STORM, a scalable resource
    management system for clusters
  • Algorithms FCFS, GS, SB, FCS, using backfilling
  • MPI synthetic (BSP) and LANL applications
  • Different granularities and communication
    patterns
  • Flexible workload model, 1000 jobs
  • Time shrinking
  • Three clusters, using QsNet
  • Pentium III 32x2 1GB/node
  • Itanium II 32x2 2GB/node
  • Alpha EV6 64x4 8GB/node

17
Experiments Overview
  • Use synthetic applications for basic insights
  • Effect of multiprogramming level
  • Effect of backfilling
  • Effect of time quantum
  • Effect of load
  • Use LANLs Sage/Sweep3D for application study
  • Caveat emptor
  • Only LANL applications
  • Does not follow input workload closely
  • Limited set of inputs
  • Different architecture (Alpha)

18
Effect of MPL
  • Questions
  • What is the effect of preemptive multiprogramming
    compared to FCFS (batch) scheduling?
  • Higher MPL higher performance?
  • Parameters
  • GS with MPL values 1?6 (1batch)
  • Input load 75
  • Bounded slowdown, cutoff at 10s

19
MPL Response Time
20
MPL Bounded Slowdown
21
Effect of Backfilling
  • Adding backfilling (the future) to GS/Batch
    helps?

22
Backfilling Response time
  • Backfilling helps short jobs, harms long jobs

23
Effect of Time Quantum
  • Shorter time quantum pros
  • System more responsive
  • Less external fragmentation
  • Longer time quantum pros
  • Less cache/memory pressure
  • Less synchronization overhead
  • Setup
  • GS at 75 load
  • Compare Pentium III to Itanium II

24
Time Quantum Response Time
25
Effect of Load
  • Comparing FCFS, GS, SB and FCS with backfilling
  • Varying offered load by increasing run times
  • Load values 40 ? 90
  • No measurements after saturation point

26
Load Response Time
27
Load - Bounded Slowdown
28
Response Time Median
29
Bounded Slowdown Median
30
500 Shortest jobs CDF
31
500 Longest jobs CDF
32
Scientific Applications
  • Sage and Sweep3D
  • Hydrodynamics codes
  • approx. 50-80 of LANL cycles
  • Memory-constrained
  • Mostly operating out of cache
  • Relatively load-balanced
  • Parameters
  • MPL 2, 100ms time quantum
  • 1000 jobs, modeled arrival times, random run
    times
  • Realistic inputs, biased toward short runs

33
Response Time
34
Bounded Slowdown
35
Conclusions - methodology
  • A more realistic evaluation of job scheduling
  • Repeatable experiments
  • Allows isolation of factors
  • Direct comparison of platforms on the
    applications you care most about

36
Conclusions - experiments
  • Significant improvement over FCFS can be achieved
    with multiprogramming, even MPL2
  • Backfilling can also make a difference
  • Batch programming discriminates against short
    jobs
  • Multiprogramming for scientific apps pays off,
    even with MPL 2
  • FCS can outperform explicit/implicit coscheduling
  • For more information eitanf_at_lanl.gov

37
Time Quantum Slowdown
Write a Comment
User Comments (0)
About PowerShow.com