In Search for Simplicity: A SelfOrganizing, MultiSource Multicast Overlay - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

In Search for Simplicity: A SelfOrganizing, MultiSource Multicast Overlay

Description:

Long term trend ... Increasingly rewarding to aggregate the capabilities ... Grids and P2P consequences of this trend. Increasingly more power' in the tail ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 23
Provided by: mateir
Category:

less

Transcript and Presenter's Notes

Title: In Search for Simplicity: A SelfOrganizing, MultiSource Multicast Overlay


1
In Search for Simplicity A Self-Organizing,
Multi-Source Multicast Overlay
  • Matei Ripeanu
  • The University of Chicago

2
Long term trend
  • Increasingly rewarding to aggregate the
    capabilities small sites/machines
  • A virtual machine aggregating the last 10 in
    Top500 would rank 63rd in 94 but 17th in 04
  • Grids and P2P consequences of this trend
  • Increasingly more power in the tail
  • of these distributions

3
  • Grids infrastructure to support federated
    resource sharing.
  • Early Grid work
  • Focus on defining service interfaces and
    individual behaviors
  • Implementations often based on centralized
    components
  • Todays challenge
  • Self-organizing, adaptive services.
  • Self-organization system behavior emerges as a
    result of
  • independent decisions made by system components
  • using incomplete information about the entire
    system.

4

Group communication (multicast) functionality
  • Applications
  • Conferencing
  • White-board type applications
  • Resource monitoring and discovery
  • Data distribution
  • Setting
  • Multiple senders and receivers
  • Medium scale
  • Challenge
  • Build/maintain support structures (overlays)
  • that map well on a heterogeneous set of end nodes
    and network paths
  • when facing node and link volatility
  • using decentralized, adaptive algorithms.

5
Roadmap
  • Introduction
  • UMM An Unstructured Multi-source Multicast
  • Application study Replica Location Service
  • Conclusions

6
Existing approaches
  • Shared Tree (NICE, ALMI, Overcast)
  • No need for explicit routing protocol
  • But fragile to failures, does not exploit all
    available capacity, larger delays (for
    multi-source)
  • Unstructured Mesh
  • Random overlay flooding (Gnutella)
  • Simple, resilient.
  • But inefficient resource usage (duplicate
    traffic).
  • 'Measurement based overlays (Narada,
    Scattercast)
  • Unstructured overlay mesh, initially random
  • Routing protocol to extract distribution trees
    for each source.
  • But (1) Overhead to build/maintain routing
    tables, (2) Tree extraction and overlay
    optimization are coupled.
  • Structured Overlay (M-CAN, Bayeux, Pastry)
  • Scalable. Promise structure reusability.
  • But (1) Complex protocols to maintain structure,
    (2) Difficult to adapt to heterogeneous node/link
    capacities (Bharambe05)

7
UMM design guidelines
  • Use unstructured overlays for their
  • Low construction/maintenance costs
  • Ability to map well to heterogeneous node and
    link characteristics
  • ? Adaptiveness
  • Prefer a simple design over a highly optimized
    complex one.
  • We attempt preserve flooding simplicity and try
    to reduce its overheads.
  • Soft-state protocols and passive state collection
    are preferable to active state maintenance
  • ? Deployability, reliability.
  • Keep design layers independent
  • ? Ability to optimize and evolve layers
    individually

8
UMM in one slide
  • Build reasonably good unstructured mesh
  • Short-long heuristics (similar to Saxons
    NSDI04, M-CAN)
  • ½ links short optimized for delay,
  • ½ links long optimized for throughput
  • Any other mesh building technique would work
  • Extract distribution trees
  • Start with flooding
  • Use the implicit information from
  • duplicate messages to extract
  • source-specific distribution
  • trees
  • Deal with failures, adaptation
  • Acquired routing info is soft-state
  • Link failure info flooded with low TTL

A
B
C
D
9
Success metrics
  • Overheads compared to IP multicast
  • Relative Delay Penalty Overlay-delay vs.
    IP-delay
  • Stress number of duplicate packets on a physical
    link
  • Flexibility in adapting to changes
  • set of participating nodes, underlying network
    characteristics
  • Reliable delivery
  • Number of lost messages
  • Protocol complexity
  • number of explicit messages to build state, state
    size

MIT1
MIT1
Berkeley
Berkeley
UCSD
MIT2
MIT2
UCSD

CMU1
CMU1
IP Multicast
Overlay
CMU2
CMU2
10
Evaluation Methodology
  • Design and implementation evaluated in two
    contexts
  • ModelNet - emulated wide-area network testbed
  • Application runs unmodified
  • Controlled network characteristics
  • IP-multicast used as common base to compare with
    published results
  • Topologies
  • Inet generated,
  • 4040 routers with end-nodes randomly attached
  • PlanetLab deployment
  • Evaluate performance when facing real network
    dynamics

11
Evaluation ModelNet
  • Estimate
  • relative delay penalty
  • stress
  • Graph min, max, average values over 2
    topologies, 10 runs, 20 of all trees

90-tile Relative Delay Penalty Maximum
link stress
Data source for systems other than UMM Jain et
al. USITS03
12
Evaluation PlanetLab - Adaptation
  • Bootstrap, handling node failures
  • RDP and stress evolution 100 nodes start, then
    10 nodes crash and rejoin 900s later
  • Note few messages lost
  • Visualization tool (with Dinoj Sunrendan)

OverViz
13
Why is this solution appealing?
  • Low-overhead, reliable, simple
  • Low-overhead uing passive data collection to
    build distribution trees (simply inspecting the
    message flow).
  • Reliable distributed structure to support
    routing all routing state is soft-state
  • Preserves simplicity of flooding-based solutions
  • Self-organizing nodes make independent decisions
    based only on local information
  • Scales with the number of sources, not with the
    number of passive participants

14
Roadmap
  • Introduction
  • UMM An Unstructured Multi-source Multicast
  • Application study Replica Location Service
  • Conclusions

15
Application -- Replica Location Service (RLS)
  • Replication often used to improve reliability,
    access latency, or availability.
  • Need efficient mechanism to locate replicas
  • Map logical ID to replica location(s).
  • Common to cooperative proxy caches, distributed
    object systems.
  • Data Grids operations
  • client presents an identifier (LFNlogical file
    name) and asks for one, many or all replica
    locations (PFNphysical file name).
  • Client presents a set of LFN and asks for a site
    where (most) replicas of these files are already
    present

16
RLS Application Requirements
  • Data-intensive, scientific applications
    requirements
  • Target scale
  • up to 500m replicas by 2008,
  • 100s of sites.
  • Decoupling sites able to operate independently
  • Support efficient location of collocated sets of
    replicas,
  • (requirements and trace analysis)
  • Lookup rates order(s) of magnitude higher than
    update rates,
  • Flat popularity distribution for files (non-Zipf)
  • Most add/update operations are done in batches.

17
RLS The pieces
  • Replica Location Nodes
  • Local state set of (lfn, pfn) mappings
  • Compressed representation Bloom Filters
  • State dissemination
  • UMM -- overlay
  • Soft-state messages

18
Discussion
  • Tradeoffs
  • Accuracy (false positive rate) vs. memory and
    bandwidth cost
  • Accuracy (false negative rate) vs. bandwidth cost
  • Extra CPU and memory vs. bandwidth cost with
    compressed Bloom filters.
  • Workload and deployment characteristics reduce
    appeal of DHT-based solutions
  • Existing users prefer low-latency queries over
    low memory usage
  • Current RLS implementation offers
  • a deployment choice

19
RLS Prototype implementation and evaluation
  • Bloom filters
  • Fast lookup (7-10µs), add, delete operations
    (14µs)
  • (isolated) Replica Location Node
  • Lookup rates up to 24,000 lookups/sec.
  • Add, delete about half of lookup performance
  • Overlay performance
  • Deployed and tested on PlanetLab
  • Meets user-defined accuracy
  • Alternatively caps for generated traffic.
  • Congestion management
  • Reduced accuracy for faraway nodes

20
Contributions UMM, RLS
  • (UMM) Unstructured multi-source multicast
    overlay
  • As efficient as solutions based on structured
    overlays or on running full routing protocols at
    the overlay level
  • Simple protocol based on flooding and passive
    data collection.
  • Self-organizing
  • (RLS) Replica Location Service
  • Application layered on top of UMM
  • Proposed design
  • Meets requirements of data-intensive, scientific
    applications.
  • Is simple, decentralized, adaptive.
  • Matei Ripeanu and Ian Foster, A Decentralized,
    Adaptive, Replica Location Service, HPDC 2002.
  • A. Chervenak,  E. Deelman, I. Foster, A.
    Iamnitchi, C. Kesselman, W. Hoschek, P. Kunst, M.
    Ripeanu, B. Schwartzkopf, H. Stockinger, K.
    Stockinger, and Brian Tierney, Giggle A
    Framework for Constructing Scalable Replica
    Location Services, SC2002.
  • Matei Ripeanu, I. Foster, A. Iamnitchi, A.
    Rogers, UMM-An unstructured multisource overlay,
    2005 (submitted)

21
Open questions
  • Geometries to organize interactions between
    resources
  • Random graphs
  • Power-law graphs present in a number of instances
    (Internet, power-grid, Gnutella).
  • Low cost to build and maintain
  • Structured geometries
  • Structure can be exploited by applications
  • High maintenance cost when nodes volatile,
    heterogeneous
  • Where is the tipping point? (application
    requirements?)
  • Large systems and self-organization
  • Are there generic building blocks for
    self-organization?

22
Thank you
  • More information, links to papers
  • http//www.cs.uchicago.edu/matei
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com