Simulation for Grid Computing - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Simulation for Grid Computing

Description:

Simulation for Grid Computing Henri Casanova Univ. of California, San Diego casanova_at_cs.ucsd.edu Grid Research Grid researchers often ask questions in the broad area ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 67
Provided by: HenriCa7
Category:

less

Transcript and Presenter's Notes

Title: Simulation for Grid Computing


1
Simulation forGrid Computing
  • Henri Casanova
  • Univ. of California, San Diego
  • casanova_at_cs.ucsd.edu

2
Grid Research
  • Grid researchers often ask questions in the broad
    area of distributed computing
  • Which scheduling algorithm is best for this
    application on a given Grid?
  • Which design is best for implementing a
    distributed Grid resource information service?
  • Which caching strategy is best for enabling a
    community of users that to distributed data
    analysis?
  • What are the properties of several resource
    management policies in terms of fairness and
    throughput?
  • What is the scalability of my publish-subscribe
    system under various levels of failures
  • ...

3
Grid Research
  • Analytical or Experimental?
  • Analytical Grid Computing research
  • Some have developed purely analytical /
    mathematical models for Grid computing
  • makes it possible to prove interesting theorems
  • often too simplistic to convince practitioners
  • but sometimes useful for understanding principles
    in spite of dubious applicability
  • One often uncovers NP-complete problems anyway
  • e.g., routing, partitioning, scheduling problems
  • And one must run experiments
  • Grid computing research based on experiments
  • Most published works

4
Grid Computing as Science?
  • You can tell you are a scientific discipline if
  • You can read a paper, easily reproduce (at least
    a subset of) its results, and improve
  • You can tell to a grad student Here are the
    standard tools, go learn how to use them and come
    back in one month
  • You can give a 1-hour seminar on widely accepted
    tools that are the basis for doing research in
    the area
  • We are not there today
  • But perhaps I can give a 1-hour seminar on
    emerging tools that could be the basis for doing
    research in the area, provided a few open
    questions are addressed
  • Need for standard ways to run Grid experiments

5
Grid Experiments
  • Real-world experiments are good
  • Eminently believable
  • Demonstrates that proposed approach can be
    implemented in practice
  • But...
  • Can be time-intensive
  • Execution of applications for hours, days,
    months, ...
  • Can be labor-intensive
  • Entire application needs to be built and
    functional
  • including all design / algorithms alternatives
  • include all hooks for deployment
  • Is it a bad engineering practice to build many
    full-fledge solutions to find out which ones work
    best?

6
Grid Experiments (2)
  • What experimental testbed?
  • My own little testbed
  • well-behaved, controlled, stable
  • often not representative of real Grids
  • Real grid platforms
  • (Still) challenging for many grid researchers to
    obtain
  • Not built as a computer scientists playpen
  • other users may disrupt experiments
  • other users may find experiments disruptive
  • Platform will experience failures that may
    disrupt the experiments
  • Platform configuration may change drastically
    while experiments are being conducted
  • Experiments are uncontrolled and unrepeatable
  • even if disruption from other users is part of
    the experiments, it prevents back-to-back runs of
    competitor designs / algorithms

7
Grid Experiments (3)
  • Difficult to obtain statistically significant
    results on an appropriate testbed
  • And to make things worse...
  • Experiments are limited to the testbed
  • What part of the results are due to
    idiosyncrasies of the testbed?
  • Extrapolations are possible, but rarely
    convincing
  • Must use a collection of testbeds...
  • Still limited explorations of what if scenarios
  • what if the network were different?
  • what if we were in 2015?
  • Difficult for others to reproduce results
  • This is the basis for scientific advances!

8
Simulation
  • Simulation can solve many (all) of these
    difficulties
  • No need to build a real system
  • Conduct controlled / repeatable experiments
  • In principle, no limits to experimental scenarios
  • Possible for anybody to reproduce results
  • ...
  • Simulation
  • representation of the operation of one system
    (A) through the use of another (B)
  • Computer simulation B ? a computer program
  • Key question Validation
  • correspondence between simulation and real-world

9
Simulation in Computer Science
  • Microprocessor Design
  • A few standard cycle-accurate simulators are
    used extensively
  • http//www.cs.wisc.edu/arch/www/tools.html
  • Possible to reproduce simulation results
  • Networking
  • A few standard packet-level simulators
  • NS-2, DaSSF, OMNeT
  • Well-known datasets for network topologies
  • Well-known generators of synthetic topologies
  • SSF standard http//www.ssfnet.org/
  • Possible to reproduce simulation results
  • Grid Computing?
  • None of the above up until a few years ago
  • Most people built their own ad-hoc solutions
  • Promising recent developments

10
Simulation of Distributed Computing Platforms?
  • Simulation of parallel platforms used for decades
  • Most simulations have made drastic assumptions
  • Simplistic platform model
  • Processors perform work at some fixed rate
    (Flops)
  • Network links send data at some fixed rate
  • Topology is fully connected (no communication
    interference) or a bus (simple communication
    interference)
  • Communication and computation are perfectly
    overlappable
  • Simplistic application model
  • All computation is CPU intensive
  • Clear-cut communication and computation phases
  • Application is deterministic
  • Straightforward simulation in most cases
  • Just fill in a Gantt chart with a computer rather
    than by hand
  • No need for a simulation standard

11
Grid Simulations?
  • Simple models
  • perhaps justifiable for a switched dedicated
    cluster running a matrix multiply application
  • Hardly justifiable for grid platforms
  • Complex and wide-reaching network topologies
  • multi-hop networks, heterogeneous bandwidths and
    latencies
  • non-negligible latencies
  • complex bandwidth sharing behaviors
  • contention with other traffic
  • Overhead of middleware
  • Complex resource access/management policies
  • Interference of communication and computation

12
Grid Simulations
  • Recognized as a critical area
  • Grid eXplorer (GdX) project (INRIA)
  • Build an actual scientific instrument
  • Databases of experimental conditions
  • 1000-node cluster for simulation/emulation
  • Visualization tools
  • Still in planning
  • What simulation technology???
  • Two main questions
  • What does a representative Grid look like?
  • How does one do simulation on a synthetic
    representative Grid?

13
Presentation Outline
  • Introduction
  • Simulation for Grid Computing?
  • Generating Synthetic Grids
  • Simulating Applications on Synthetic Grids
  • Current Work and Future Directions

14
Why Synthetic Platforms?
  • Two goals of simulations
  • Simulate platforms beyond the ones at hand
  • Perform sensitivity analyses
  • Need Synthetic platforms
  • Examine real platforms
  • Discover principles
  • Implement platform generators
  • What Simulation results in my paper?
  • Results for a few real platforms
  • Results for many synthetic platforms

15
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph

network link
16
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph
  • Bandwidth and Latencies

backbone
LAN
17
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph
  • Bandwidth and Latencies
  • Compute Resources
  • And other resources

18
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph
  • Bandwidth and Latencies
  • Compute Resources
  • And other resources
  • Background Conditions
  • Load and unavailability

data store
workstation
server
cluster
19
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph
  • Bandwidth and Latencies
  • Compute Resources
  • And other resources
  • Background Conditions
  • Load and unavailability
  • Failures

data store
workstation
X
X
X
X
server
cluster
20
Generation of Synthetic Grids
  • Three main elements
  • Network Topology
  • Graph
  • Bandwidth and Latencies
  • Compute Resources
  • And other resources
  • Background Conditions
  • Load and unavailability
  • Failures
  • What is Representative and Tractable?

data store
workstation
X
X
X
X
server
cluster
21
Synthetic Network Topologies
  • The network community has wondered about the
    properties of the Internet topology for years
  • The Internet grows in a decentralized fashion
    with seemingly complex rules and incentives
  • Could it have a mathematical structure?
  • Could we then have generative models?
  • Three generations of graph generators
  • Random
  • Structural
  • Degree-based

22
Random Topology Generators
  • Simplistic
  • Place N nodes randomly in a square
  • Vertices u,v connected with prob. P(u,v) ?
  • Waxman JSAC88
  • Place N nodes randomly in a CxC square
  • P(u,v) ? e-d / (? C v2), 0 lt ?, ? 1
  • d Euclidian distance between u and v
  • First model to be widely used for network
    simulations
  • Others
  • Exponential, Locality-based, ...
  • Shortcoming Real networks have a non-random
    structure with some hierarchy

23
Structural Generators
  • ... the primary structural characteristic
    affecting the paths between nodes in the Internet
    is the distribution between stub and transit
    domains... In other words, there is a hierarchy
    imposed on nodes... Zegura et al, 1997
  • Both at the AS level (peering relationships)
    and at the router level (Layer 2)
  • Quickly became accepted wisdom
  • Transit-Stub Calvert et al., 1997
  • Tiers Doar, 1996
  • GT-ITM Zegura et al., 1997

24
Power-Laws!
  • In 1999, Faloutsos et al. SIGCOMM99 rocked
    topology generation with power laws
  • Results obtained both for AS-level and
    router-level information from real networks
  • Outdegree (number of edges leaving a node)
  • For each node v, compute its outdegree dv
  • Node rank, rv the index of v in the order
    of decreasing degree
  • Nodes can have the same rank
  • Law dv proportional to rvR

Measured
25
Power-Laws!
  • Random Generators do not agree with
    Power-laws
  • Structural Generators do not agree with Power-laws

Waxman
GT-ITM, flat
GT-ITM, Transit-Stub
26
Degree-based Generators
  • New common wisdom A topology that does not agree
    with power-laws cannot be representative
  • Flurry of development of power-law generators
    after the Faloutsos paper
  • CMU power law graph generator Palmer et al.,
    2000
  • Inet Jin et al., 2000
  • BRITE Medina et al., 2001
  • PLRG Aiello et al., 2000

BRITE
27
Structure vs. Power-law
  • We know network have structure AND power laws
  • Combine both?
  • GridG project Dinda et al., SC03
  • Use a Tiers-generated topology
  • Add random links to satisfy the power-laws
  • How about just using power-laws?
  • Comparison paper Tangmunarunkit et al.,
    SIGCOMM02
  • Degree-based generators capture the large-scale
    structure of real networks very well, better than
    structural generators!
  • structural generators impose too strict a
    hierarchy
  • Hierarchy arises naturally from the degree-based
    generators
  • e.g., backbone links
  • Works both for AS-level and router-level

28
Short story
  • What generator?
  • To model large networks (e.g., gt 500 routers)
  • use degree-based generators
  • To model small networks (e.g., lt 100 routers)
  • use structural generators
  • degree-based will not create any hierarchy

29
Bandwidths, latencies, traffic, etc.
  • Topology generators only produce a graph
  • We need link characteristics as well
  • Option 1 Model physical characteristics
  • Some models in topology generators
  • Need to simulate background traffic
  • No accepted model for generating background
    traffic
  • Simulation can be very costly
  • Option 2 Model end-to-end performance
  • Models (Lee, HCW01) or Measurements (NWS, ...)
  • Go from path modeling to link modeling?
  • Turns out to be a difficult question
  • DARPA workshop on network modeling

30
Bandwidths, latencies, traffic, etc.
  • Maybe none of this matters?
  • Fiber inside the network mostly unused
  • Communication bottleneck is the local link
  • Appropriate tuning of TCP or better protocols
    should saturate the local link
  • Dont care about topology at all!
  • Or maybe none of this matters for my application
  • No network contention

31
Compute Resources
  • What resources do we put at the end-points?
  • Option 1 ad-hoc generalization
  • Look at the TeraGrid
  • Generate new sites based on existing sites
  • Option 2 Statistical modeling
  • Examing many production resources
  • Identify key statistical characteristics
  • Come up with a generative/predictive model

32
Synthetic Clusters?
  • Many Grid resources are clusters
  • What is the typical distribution of clusters?
  • Commodity Cluster synthesizer Kee et al.,
    SC04
  • Examined 114 production clusters (10K procs)
  • Came up with statistical models
  • Validated model against a set of 191 clusters
    (10K procs)
  • Models allow extrapolation for future
    configurations
  • Models implemented in a resource generator

33
Architecture/Clock Models
Processor Fraction ()
Pentium2 1.4
Celeron 4.1
Pentium3 40.3
Pentium4 34.6
Itanium 3.9
Athlon 0.0
AthlonMP 12.4
AthlonXP 1.3
Opteron 2.0
  • Current distribution of proc families
  • Linear fit between clock-rate and release year
    within a processor family
  • Quadratic fraction of processors released on a
    given year
  • Model future distributions and speeds

34
Other models?
  • Other models
  • Cache size grows logarithmically
  • Number of processors per node log2-normal
  • Memory size log2-normal
  • Number of nodes per cluster log2-normal
  • Models were validated against a snapshot of ROCKS
    clusters
  • These clusters have been added to the training
    set
  • More clusters are added every month
  • GridG
  • Provide a generic framework in which such laws
    and other correlations can be encoded

35
Resource Availability / Workload
  • Probabilistic models
  • Naive exp. distributed availability and
    unavailability intervals
  • Sophisticated Weibull distributions Wolski et
    al.
  • Traces
  • NWS, etc.
  • Desktop Grid resources Kondo, SC04
  • Workload models
  • e.g., Batch schedulers
  • Traces
  • Models Feitelson, JPDC03
  • job inter-arrival times Gamma
  • amount of work requested Hyper-Gamma
  • number of processors requested Compounded (2,
    1, ...)

36
A Sample Synthetic Grid
  • Generate 5,000 routers with BRITE
  • Annotate latency according to BRITEs Euclidian
    distance method (scaling to obtain the desired
    network diameter)
  • Annotate bandwidth based on a set of end-to-end
    NWS measurements
  • Pick 30 of the end-points
  • Generate a cluster at each end-point according to
    Kees synthesizer for Year 2006
  • Model cluster load with Feitelsons model with a
    range of parameters for the random distributions
  • Model resource failures based on Inca
    measurements on TeraGrid

37
Synthetic Grid Generation
  • Still far from widely accepted standards
  • Many ongoing, promising efforts
  • Researchers have recognized this as an issue
  • Tools from networking can be reused
  • A few Grid tools are available
  • What is really needed
  • Repository of Grid Measurements
  • Repository of Synthetic/Generated Grids

38
Presentation Outline
  • Introduction
  • Simulation for Grid Computing?
  • Generating Synthetic Grids
  • Simulating Applications on Synthetic Grids
  • Current Work and Future Directions

39
Simulation Levels
  • Simulating applications on a synthetic platform?
  • Spectrum of simulation levels

Mathematical Simulation Discrete-event Simulation
Emulation
more abstract
Based solely on equations
Abstraction of system as a set of dependent
actions and events (fine- or coarse-grain)
less abstract
Trapping and virtualization of low-level applicati
on/system actions
  • Boundaries above are blurred (d.e. simulation
    emulation)
  • A simulation can combine all paradigms at
    different levels

40
Simulation Options
  • Network
  • Macroscopic Flows in pipes
  • coarse-grain d.e. simulation mathematical
    simulation
  • Microscopic Packet-level simulation
  • fine-grain d.e. simulation
  • Actual flows go through some network
  • emulation
  • CPU
  • Macroscopic Flows in a pipe
  • coarse-grain d.e. simulation mathematical
    simulation
  • Microscopic Cycle-accurate simulation
  • fine-grain d.e. simulation
  • Virtualization via another CPU / Virtual Machine
  • emulation

more abstract
less abstract
41
Simulation Options
  • Application
  • Macroscopic Application analytical flow
  • Less Macroscopic sets of abstract tasks
    with resource needs and dependencies
  • coarse-grain d.e. simulation
  • Application specification or pseudo-code API
  • Virtualization
  • emulation of actual code with trapping of
    application-generated events
  • Two projects
  • MicroGrid (UCSD)
  • SimGrid (UCSD IMAG Univ. Nancy)

42
MicroGrid
  • Set of simulation tools for evaluating
    middleware, applications, and network services
    for Grid systems
  • Applications are supported by emulation and
    virtualization
  • Actual application code is executed on
    virtualized resources
  • Microgrid accounts for
  • CPU
  • network
  • Virtualization

Application
Virtual Resources
MicroGrid
Physical Resources
43
MicroGrid Virtualization
  • Resource virtualization
  • resource names are virtualized
  • gethostname, sockets, Globus GIS, MDS, NWs, etc.
  • Time virtualization
  • Simulating the TeraGrid on a 4 node cluster
  • Simulating a 4 node cluster on the TeraGrid
  • CPU virtualization
  • Direct execution on (a fraction of) a physical
    resource
  • No application modification
  • Main challenge
  • Synchronization between real time and virtual
    time

44
MicroGrid Network
  • Packet-level simulation
  • Network calls are intercepted
  • Are sent to a network simulator that has been
    configured with the virtual network topology
  • Implemented with MaSSF
  • An MPI version of DaSSF Liu et al., 2001
  • Configured via SSF standard (MDL, etc.)
  • Protocol stacks implemented on top of MaSSF
  • TCP, UDP, BGP, etc.
  • Socket calls are trapped without application
    modification
  • real data is sent
  • delays are scaled to match the virtual time
  • Main challenges
  • scaling (20,000 routers on a 128-node cluster)
  • synchronization with computation

45
MicroGrid in a NutShell
  • Virtualization via trapping of
  • application events
  • (emulation)
  • Can have high overhead
  • But captures the overhead!
  • Virtualization via another CPU
  • (emulation)
  • Can be really slow
  • But hopefully accurate
  • Microscopic Packet-level simulation
  • (fine-grain discrete event simulation)
  • Can be really slow for long transfers
  • But hopefully accurate

CPU
Application
Network
Emulation Discrete-event Mathematical
Simulation Simulation
more abstract
less abstract
46
SimGrid
  • Originally developed for scheduling research
  • Must be fast to allow for thousands of simulation
  • Application
  • No real application code is executed
  • Consists of tasks that have
  • dependencies
  • resource consumptions
  • Resources
  • No virtualization
  • A resource is defined by
  • a rate a which it does work
  • a fixed overhead that must be paid by each task
  • traces of the above if needed failures
  • A task can use multiple resources
  • data transfer over multiple links, computation
    that uses a disk and a CPU

47
SimGrid
  • Uses a combination of mathematical simulation and
    coarse-grain discrete event simulation
  • Simple API to specify an application rather
    than having it already implemented
  • Fast simulation
  • Key issue Resource sharing
  • In MicroGrid resource sharing emerges out of
    the low-level emulation and simulation
  • Packets of different connections interleaved by
    routers
  • CPU cycles of different processes get slices of
    the CPU
  • Drawback slow simulation
  • How can one do something faster that is still
    reasonable?
  • Come up with macroscopic models of resource
    sharing

48
Resource Sharing in SimGrid
  • Macroscopic resource sharing can be easy
  • A CPU CPU-bound processes/threads get a fair
    share of the CPU in steady state
  • Why go through the trouble of emulating CPU-bound
    processes?
  • Just say how many cycles they need, and compute
    how many cycles they get per second
  • Macroscopic resource sharing can be not so easy
  • The Network
  • Many end-points, routers, and links
  • Many end-to-end TCP flows?
  • How much bandwidth does each flow receive?

49
Bandwidth Sharing
  • Macroscopic TCP modeling is a field
  • Fluid in Pipe analogy
  • Rule of Thumb Share of what a flow gets on its
    bottleneck link is inversely proportional to its
    Round-Trip Time
  • Turns out TCP in steady-state implements a type
    of resource sharing called Max-Min Fairness

50
Max-Min Fairness
  • Principle
  • Consider the set of all network links, L
  • cl is the capacity of link l
  • Considers the set of all flows, R
  • a flow a subset of L
  • xr is the bandwidth allocated to flow r
  • Bandwidth capacities are respected
  • ? l ? L, ? r ? R l ? r xr cl
  • TCP in steady-state is such that
  • minr ? R xr is maximized
  • The above can be solved efficiently (with
    appropriate data structures)

51
SimGrid
  • Uses the Max-Min fairness principle for all
    resource sharing
  • fast
  • validated in the real-world for CPUs
  • validated with NS-2 for networks
  • Limitation
  • Max-Min fairness is for steady-state
  • e.g., no TCP slow-start
  • e.g., no process priority boosts
  • Unclear when it starts breaking down
  • Is justified for long enough transfers and
    computations
  • reasonable for scientific applications
  • not so much for applications such as a Grid
    information service

52
SimGrid in a NutShell
  • Macroscopic Flows in a pipe
  • (mathematical simulation
  • coarse-grain d.e. simulation)
  • Very fast
  • Not accurate for short transfers
  • Macroscopic Flows in a pipe
  • (mathematical simulation
  • coarse-grain d.e. simulation)
  • Very fast
  • Abstract application model
  • Macroscopic abstract tasks with resource needs
    and dependencies
  • (coarse-grain d.e. simulation)
  • Very fast
  • Abstract application model

CPU
Application
Network
Emulation Discrete-event Mathematical
Simulation Simulation
more abstract
less abstract
53
Other Projects
  • ModelNet
  • Network emulation
  • unmodified application
  • packets routed through a core cluster
  • GigaBit-switched nodes running a modified kernel
  • Emulates router queues
  • More emulation than MicroGrid
  • Only for networking, but plans to add support for
    computation
  • Still many in-house simulators that may aspire
    to become widely-used tools
  • EmuLab/DummyNet
  • ChicagoSim
  • OptorSim
  • EDGSim
  • GridSim
  • ...

54
So what should I use?
  • It really depends on your goal / resources
  • SimGrids network model has clear limitations,
    e.g. for short transfers
  • SimGrid simulations are easy to set up
  • MicroGrid simulations take a lot of time
    (although they can be parallelized)
  • ModelNet requires some hardware setup
  • SimGrid does not require for a full application
    to be written
  • MicroGrid models overhead of system calls
    implicitly
  • ...
  • Key trade-off accuracy and speed
  • The more abstract the simulation the fastest
  • The less abstract the simulation the most
    accurate
  • Does this trade-off really hold?

55
Simulation Validation
  • The crux of most simulation work in most domains
    of computer science
  • Validation is difficult and almost never done
    convincingly
  • Provide justification that the model is plausible
  • Convince people that the simulator implements the
    model (verification)
  • Provide a few graphs that show that its
    reasonable
  • validation in a few special cases, at best
  • validation against another validated simulator
  • Argue that although absolute values are off, the
    trends are respected
  • Conclude that the simulator is useful to compare
    algorithms/designs
  • Obtain scientific results?????

56
FLASH vs. FLASH
  • FLASH vs. (Simulated) FLASH Closing the
    Simulation Loop Gibson et al., ASPLOS00
  • FLASH project at Stanford
  • building large-scale shared-memory
    multiprocessors
  • Went from conception, to design, to actual
    hardware (32-node)
  • Used simulation heavily over 6 years
  • The authors went back and compared simulation to
    the real world!
  • Simulation error is unavoidable
  • 30 error in their case was not rare
  • Negating the impact of we got 1.5 improvement
  • One should focus on simulating the important
    things
  • A more complex simulator does not ensure better
    simulation
  • simple simulators worked better than
    sophisticated ones, which were unstable
  • simple simulators predicted trends as well as
    slower, sophisticated ones
  • It is key to use the real-world to tune/calibrate
    the simulator
  • Conclusion for FLASH, the simple simulator was
    all that was needed

57
Presentation Outline
  • Introduction
  • Simulation for Grid Computing?
  • Generating Synthetic Grids
  • Simulating Applications on Synthetic Grids
  • Current Work and Future Directions

58
Grid Simulation Accuracy vs. Speed
  • Comparing simulators and validating them is a
    gigantic amount of work
  • It will never been clear-cut
  • identify simulation regimes
  • It doesnt lead to many papers
  • Its eminently politically incorrect
  • Results depend on what the simulation is used for
  • Therefore nobody does it
  • The story one would like to tell is, e.g.
  • Start with SimGrid simulations at first to
    identify promising approaches
  • Move to MicroGrid emulations to precisely
    quantify the trade-offs
  • How can we substantiate this story?

59
Current Work in SimGrid
  • SimGrid uses a simulation engine called SURF
  • currently SURF performs a blend of mathematical
    simulation and discrete event simulation
  • Current work
  • Adding a MaSSF back-end (i.e., MicroGrid)
  • Adding a ModelNet back-end
  • Goal evaluate the speed-accuracy trade-off for
    simulation of the network
  • Expected result
  • SimGrid sufficient and fast for large-enough
    messages
  • e.g., Good for scientific applications
  • MaSSF/ModelNet required for small/frequent
    messages
  • e.g., Good for middleware applications (e.g.,
    NWS), p2p applications
  • But beware FLASH vs. FLASH!

60
What about Validation?
  • The GRAS project Quinson et al.
  • Idea provide a way to compile code into
    real-world code and into simulation code
  • Write the application using the GRAS API
  • Compile it into a SimGrid simulation
  • Compile it into a real-world code
  • currently provides its own back-end and
    deployment
  • could use Globus as a back-end
  • Run and compare
  • Goals
  • smooth transition from design to prototyping to
    production
  • easy validation and simulation calibration
  • Plan
  • Use GRAS to easily compare SimGrid, MaSSF,
    ModelNet, and real world networks
  • Move on to full applications

61
Conclusion
  • Simulation is difficult
  • Eternal question What does really matters?
  • Grid researchers are actively working on it
  • Usable tools exist
  • Grid simulation today should not re-invent the
    wheel
  • Two crucial next steps
  • Repository of synthetic Grids, Grid measurement
    datasets, and Grid simulation software
  • Scientifically sound validation experiments
  • Validate simulators
  • Understand what matters and what does not
  • Only then will we have a scientific discipline
    with a standard way to conduct experiments and a
    way for researcher to reproduce each others
    results.

62
Questions?
63
(No Transcript)
64
A Simple Experiment
  • Sent out files of 500MB up to 1.5GB with TTCP
  • Using from 1 up to 16 simultaneous connections
  • Recorded the data transfer rate per connection

65
Experimental Results
Normalized data rate per connection
Number of concurrent TCP connections
66
Bandwidths, latencies, traffic, etc.
  • Option 2 Model end-to-end performance
  • Sub-option A Just scale physical link
    characteristics
  • Sub-option B Model end-to-end performance
  • Use a few simple laws to generate perceived
    network performance at the application level
  • e.g., Distribution of end-to-end transfer rates
    in Grid platforms is Normal Lee, HCW01
  • Use application-level measurements collected on
    real-platforms
  • e.g., NWS measurements, Inca measurements
  • Pick real values randomly and assign them to
    end-to-end paths

67
Bandwidths, latencies, traffic, etc.
  • Problem with Option 2 Mapping between
    end-to-end network paths characteristics and
    individual links
  • We need link-level information for the simulation
  • e.g., for bandwidth-sharing between flows
  • We often only have path-level information
  • Heavy-duty tools exist for obtaining link-level
    information
  • So far only ad-hoc approaches for this mapping
  • e.g., assume bottleneck is inside the network
  • Subversive thought Maybe none of this matters?
  • Fiber inside the network mostly unused
  • Each communication bottleneck is the local link
  • Appropriate tuning of TCP or better protocols can
    saturate the local link?
  • Bottom-line Still a lot of open questions
  • Upcoming DARPA workshop on large-scale network
    modeling

68
Bandwidths, latencies, traffic, etc.
  • It gets worse
  • Network performance fluctuations?
  • No accepted model
  • One option
  • Use traces from measurement tools (e.g., NWS)
  • Conduct trace-driven simulations (which may be
    expensive)
  • Network failures?
  • No accepted model
  • Use arbitrary probabilistic models
  • Use failure data from measurement tools (e.g.,
    Inca)
  • For now, a combination of the above approaches
    with reasonable assumptions is the
    state-of-the-art

69
Banwidth Sharing
  • Naïve Assumption Fair Sharing

20 Mb/sec
10 Mb/sec
10 Mb/sec
  • Good approximation for LANs
  • But what about WANs?
  • Different characteristics
  • massive bandwidth, massive traffic inside the
    network
  • Multi-hop network paths

70
Max-Min Fairness
  • Captures other resource sharing beyond networks!

Interference of Communication and Computation
kreaseck et al., IJHPCA05
CPU sharing
Write a Comment
User Comments (0)
About PowerShow.com