Divide and Conquer Algorithms for Pub/Sub Overlay Design - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Divide and Conquer Algorithms for Pub/Sub Overlay Design

Description:

Divide and Conquer Algorithms for Pub/Sub Overlay Design Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg 3 1 Department of Electrical and Computer ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 32
Provided by: ChenC152
Category:

less

Transcript and Presenter's Notes

Title: Divide and Conquer Algorithms for Pub/Sub Overlay Design


1
Divide and Conquer Algorithms for Pub/Sub Overlay
Design
  • Chen Chen 1
  • joint work with Hans-Arno Jacobsen 1,2, Roman
    Vitenberg 3
  • 1 Department of Electrical and Computer
    Engineering
  • 2 Department of Computer Science
  • University of Toronto
  • 3 Department of Informatics
  • University of Oslo

2
Example Pub/Sub
Interests boy
boy
Interests boy
girl
Interests girl
3
Pub/Sub
  • A communication paradigm
  • Subscribers express their interests
  • Publishers disseminate messages
  • Many applications and industry standards
  • Application integration, financial data
    dissemination,
  • RSS feed distribution, business process
    management
  • WS Notifications, WS Eventing,
  • OMGs Real-time Data Dissemination Service
  • Topic-based pub/sub
  • TIBCO RV
  • Googles GooPS

4
Two componentsin pub/sub implementation
  • Design of routing protocols
  • Construction of overlay
  • The design of protocols so that publications and
    subscriptions are sent most efficiently across
    the overlay network.
  • G. Li et al., ICDCS08
  • M. Castro et al., JSAC02
  • The construction of the overlay topology such
    that network traffic is minimized.
  • Chockler et al., PODC07
  • Onus et al., INFOCOM09

5
Desirable properties for overlays
  • Low average node degree
  • Low fan-out of a node
  • Low diameter
  • Topic-connectivity
  • Efficiency to construct
  • Adaptability to churn
  • Ease of distributed implementation

6
Our contributions
Previous algorithm GM
High running time cost
Full knowledge requirement
Centralized operation (difficult to decentralize)
No support for dynamic changes
Constructing from scratch only (No support for incremental addition)
Our algorithms
Low running time cost
Partial knowledge requirement
Centralized operation (easy to decentralize)
No direct support for dynamic changes
Constructing both from scratch and incrementally
7
Topic-connectivity
b,c,d
b,c,d
V1
V1
a,c
a
a,c
a
V5
V2
V5
V2
V4
V3
V4
V4
V3
b,d
a,b
a,b
b,d
a,b
Suboverlay Ga is topic-connected
Suboverlay Gb is NOT topic-connected
An overlay G
8
MinAvg-TCO problem
b,c,d
b,c,d
V1
V1
a,c
a,c
a
a
V5
V2
V5
V2
V4
V3
V4
V3
b,d
a,b
b,d
a,b
TCO1 has 5 edges
TCO2 has 10 edges
9
MinAvg-TCO problem
b,c,d
V1
  • A high-quality overlay
  • Topic-connectivity
  • Total number of edges
  • Input
  • a set of nodes V,
  • a set of topics T,
  • the interest function Int
  • MinAvg-TCO(V,T,Int) (optimization version)
  • Construct a TCO(V,T,Int,E) such that E is
    minimum.
  • Avg-TCO(V,T,Int,k) (decision version)
  • Is there a TCO(V,T,Int,E) such that Ek?
  • Theorem MinAvg-TCO is NP-complete

a
V2
a,c
V5
a,b
V3
b,d
V4
10
Greedy-Merge (GM) algorithm
  • Greedy
  • always making the choice that looks best at the
    moment
  • GM for MinAvg-TCO
  • always adding an edge with maximum link
    contribution
  • Running Time O(V2?T)
  • Approximation Ratio O(log(V?T))

11
Our contributions
Previous algorithm GM
High running time cost
Full knowledge requirement
Centralized operation (difficult to decentralize)
No support for dynamic changes
Construction from scratch only (No support for incremental addition)
Our algorithms
Low running time cost
Partial knowledge requirement
Centralized operation (easy to decentralize)
No direct support for dynamic changes
Construction both from scratch and incrementally
12
TCO join problem
  • Given p TCOs TCOd (Vd,Td,Intd,Ed), d1,..,p
  • MinAvg-TCO-Join(V,T,Int,p) (optimization version)
  • Construct a TCO(V,T,Int,E) such that E is
    minimum
  • Avg-TCO-Join(V,T,Int,p,k) (decision version)
  • Is there a TCO(V,T,Int,E) such that Ek?
  • MinAvg-TCO is a special case of MinAvg-TCO-Join
  • Theorem MinAvg-TCO-Join is NP-complete

13
Solving MinAvg-TCO-Join
  • MinAvg-TCO-Join could be solved by GM,
  • but NOT practical
  • Tear down all existing links
  • Rebuild the overlay from scratch using GM
  • It is better to preserve all existing edges and
    only add edges incrementally.

14
Bad case for incremental addition of edges
Vall interested in all topics in T
Constructing incrementally
Constructing from scratch
Vall
Vall
V1
V1
V1
V2
Vn
V2
Vn
V2
Vn
Vi
Vn-1
Vi
Vn-1
Vi
Vn-1
TCO0
TCO2
TCO1
15
Naive Merge (NM) algorithm
NM is based on the same greedy heuristic as GM.
  • GM algorithm
  • NM algorithm
  • Input (V,T,Int)
  • Output one TCO
  • Algorithm
  • - Start with an empty edge set
  • - Always add an edge with maximum link
    contribution.
  • Running time
  • Input (Vd,Td,Intd,Ed), d1,...,p
  • Output one TCO
  • Algorithm
  • - Start with existing internal-TCO links
  • - Always add a cross-TCO edge with maximum link
    contribution.
  • Running time

16
Example of NM
c
a
V0
V1
c
a,c,d
V4
d
V12
V3
a,b,c
V13
V7
c
V9
V6
V10
d
a,b,c
c
Still a prohibitively high running time!!!
a,b,c
V2
V11
b,c,d
V8
a,b,d
V14
V5
a
a,b,d
17
Star set
Given a TCO (V,T,Int,E) A Star set S is a subset
of V that covers all Vs topics.
b,c,d
b,c,d
b,c,d
V1
V1
V1
a
a
a
V5
V2
V5
V2
V5
V2
a,c
a,c
a,c
V4
V3
V4
V3
V4
V3
a,b
b,d
a,b
b,d
b,d
a,b
v3, v5 is a star set which covers all topics
a,b,c,d
v2, v3, v4 is not a star set it only covers
a,b,d
A topic-connected overlay
18
Star set
  • Star set nodes
  • Represents the interests of all the nodes
  • Can function as bridges to determine cross-TCO
    links
  • Observation minimal star sets tend to be
    substantially smaller than the total number of
    nodes.
  • How to find a minimum star set S for (V,T,Int)?
  • Equal to classic set cover problem NP-complete
  • Could be approximated with a log approximation
    ratio

19
Star Merge (SM) algorithm
  • NM algorithm
  • SM algorithm
  • Input (Vd,Td,Intd,Ed), d1,..,p
  • Output one TCO
  • Algorithm
  • - Start with existing internal-TCO links
  • - // Do nothing
  • - Always add a cross-TCO edge with maximum link
    contribution.
  • Input (Vd,Td,Intd,Ed), d1,..,p
  • Output one TCO
  • Algorithm
  • - Start with existing internal-TCO links
  • - Find a star set for each sub-TCO
  • - Always add a cross-Star edge with maximum link
    contribution.

20
Example of SM
c
a
V0
V1
c
a,c,d
V4
d
V12
V6
a,b,c
V13
V7
c
V9
a,b,c
V10
V3
d
c
Running time largely improved because stars ltlt
nodes for most cases.
a,b,c
V2
V11
b,c,d
V8
a,b,d
V14
V5
a
a,b,d
21
Divide and Conquer (DC) for MinAvg-TCO
  • The number of nodes is a dominant factor for the
    running time of the GM algorithm.
  • Divide-and-conquer
  • Divide the MinAvg-TCO problem into several
    sub-overlay construction problems
  • Conquer the sub-MinAvg-TCO problems independently
    and build sub-overlays into sub-TCOs
  • Combine these sub-TCOs to one TCO

22
Design of DC algorithm
  • How to divide the node set V
  • Node clustering vs. random partitioning
  • The number of partitions p
  • The balance between conquer and combine
  • p 1 (single partition) conquer only GM
  • p V (each node is a partition) combine only
    GM
  • How to decentralize DC
  • Note the DC algorithm as presented is fully
    centralized.
  • However, it is possible to decentralize it.
  • Theoretical analysis not straightforward.

23
Example of DC
c
a
V0
V1
c
a,c,d
V4
d
V12
V6
a,b,c
V13
V7
c
V9
a,b,c
V10
V3
d
c
- Divide overlay based on V - Conquer each
sub-TCO by GM - Combine TCO into one by SM
a,b,c
V2
V11
b,c,d
V8
a,b,d
V14
V5
a
a,b,d
24
Experiment setting
  • The number of nodes
  • V 1000 ranging from 1000 to 8000
  • The number of topics
  • T 100 ranging from 100 to 1000
  • The number of topics that subscribed by a node
  • NodeIntSize20 ranging from 10 to 100
  • Topic distribution uniform, zipf, exponential

25
Experiment design
  • Evaluation average node degree, running time
  • Star Merge for MinAvg-TCO-Join
  • DC for MinAvg-TCO
  • Random node partitioning
  • The effects of the number of nodes
  • The effects of the number of topics
  • The effects of average subscription size of a
    node
  • Comparison with RingPT
  • RingPT is an algorithm that mimics the common
    practice of building separate overlay for each
    topic.

26
Star MergeSM vs NM vs GM
27
Divide-and-conquerThe effect of the number of
nodes
28
Divide-and-conquerDC vs GM vs RingPT
29
Algorithm summary
Running time Quality of overlay edges (avg node degree) Required information Potential to Decentralize
RingPT good poor full knowledge good
GM poor O(V2?T) good O(log(V?T)) full knowledge poor
NM poor 75 of GM good full knowledge good
SM good 1.0 of GM good 0.15 compared to GM partial knowledge good
DC good 1.7 of GM good 2.12 compared to GM partial knowledge good
30
(No Transcript)
31
Minimal Number of Links
  • A typical pub/sub system combines a number of
    protocols, many of which maintaining per-link
    state
  • A node must constantly monitor the availability
    of each of its neighbors (heartbeats and
    keep-alive state)
  • If the links are maintained using TCP, there is
    the cost of connection state for each link
  • The more links there are, the fewer topics can be
    routed over each individual link, thereby
    diminishing cross-topic aggregation benefits
  • If sequential-diff-based compression scheme is
    used, there is an extra cost associated with a
    history table
Write a Comment
User Comments (0)
About PowerShow.com