On Efficient Content Matching in Distributed PubSub Sytems - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

On Efficient Content Matching in Distributed PubSub Sytems

Description:

Unlike the traditional network, no specific dest address; ... a set of attribute, val pairs over ALL attributes; Subscription Filters: ... – PowerPoint PPT presentation

Number of Views:298
Avg rating:5.0/5.0
Slides: 24
Provided by: weixio
Category:

less

Transcript and Presenter's Notes

Title: On Efficient Content Matching in Distributed PubSub Sytems


1
On Efficient Content Matching in Distributed
Pub/Sub Sytems
  • Weixiong Rao (The Chinese University of HK)
  • Lei Chen (Hong Kong University of Sci. and Tech.)
  • Ada Wai-Chee Fu (The Chinese University of HK)

2
Outline
  • Motivation
  • Background
  • Interval Tree a geometric data structure
  • Mercury a structured P2P supporting range query
  • Cobas Framework
  • CobasTree
  • 3 techniques
  • Selective multicast
  • Interval division
  • Merging CobasTrees
  • Performance Study
  • Conclusion Future work

3
Overview of Content based Pub/Sub Systems in
distributed Environment
Data Schema Attribute a 0,1) Attribute b
1,10
Content lta0.7gt ltb1gt
Publishers
D
A
Subscribers
publish
Broker
disseminate
E
Broker
register
Broker
Broker
C
F1 agt0.6
B
Broker
Unique Properties Unlike the traditional
network, no specific dest address Contents reach
the destination (subscribers) by (1) its data
contents and (2) subscribers filtering
conditions.
  • Two key Metrics
  • Low Communication Cost
  • Timely Forwarding

4
Our Observations of Content based Pub/Sub
  • Data Schema
  • The dimensionality could be high more than 10
  • Publication Content
  • a set of ltattribute, valgt pairs over ALL
    attributes
  • Subscription Filters
  • Predicates over SEVERAL attributes, NOT
    necessarily ALL attributes
  • dimensionality mismatch
  • the number of attributes in filters is NOT
    necessarily equal to that of contents

5
Cobas a pub/sub framework for structured contents
  • Motivation
  • The indexing structure is important for
    structured content based pub/sub.
  • Contents/filters follow the predefined data
    schema.
  • Existing approaches
  • A Multi-dimensional index.
  • the problem of dimensionality curse due to the
    high dimensionality.? high traffics and latency.
  • Multiple one-dimensional indexes.
  • A copy for each one-dimensional index.? high
    traffic latency.

6
Cobas basic idea
Data Schema Attribute a 0,1) Attribute b
1,10
  • Predefined Data schema
  • Publication Content
  • content value as a data point.
  • Subscription Filter
  • each predicate in a subscription filter as an
    interval
  • all filters (intervals) are organized as a
    geometric data structure (interval tree or
    segment tree ?CobasTree)
  • Matching
  • matching contents against filters may be treated
    as stabbing queries over the geometric data
    structure

point (0.7,0.7)
Content lta0.7gt
F1 agt0.6
Interval (0.6, 1)
2 Intervals
Stabbing Query
7
Cobas overview
  • In P2P like distributed environment
  • a new matching tree structure borrow the idea
    from Interval tree/Segment tree
  • Bottom-up operations ? no overloading
  • 3 techniques
  • selective multicast. ? fast matching
  • interval division ? less message cost
  • Merging ? less message cost

8
Outline
  • Motivation
  • Background
  • Interval Tree a geometric data structure
  • Mercury a structured P2P supporting range query
  • Cobas Framework
  • CobasTree
  • 3 techniques
  • Selective multicast
  • Interval division
  • Merging CobasTrees
  • Performance Study
  • Conclusion Future work

9
Background
U(w)
L(w)
primary structure balanced binary search tree
  • A segment is expanded by at most 2logn intervals
  • Union of all node intervals in each level is
    identical
  • No redundancy
  • An interval l u is registered at the highest
    node it covers

10
BackgroundP2P network and Mercury
  • P2P network
  • Support both exact query and range query
  • Semantic maintenance and Load balancing
  • Mercury
  • creating a routing hub for each attribute
  • O(log2 Nk) hops per hub

Rx
Ry
Copy from Mercury SIGCOMM04 slides
11
Outline
  • Motivation
  • Background
  • Interval Tree a geometric data structure
  • Mercury a structured P2P supporting range query
  • Cobas Framework
  • CobasTree
  • 3 techniques
  • Selective multicast
  • Interval division
  • Merging CobasTrees
  • Performance Study
  • Conclusion Future work

12
Cobas System overview
13
Cobas basic operations
overloading
b 0
b 0
b 0
b 0
b 0
No overloading because each leaf node can be the
starting point.
14
Cobas selective Multicast
b 0
b 0
b 0
b 0
2 copies, 2 units of latency
2 copies, 1 units of latency
15
Cobas Interval Division
F22,4)
F20,2)
b 0
1 copies, 1 units of latency
F22,4)
F20,1)
F21,2)
the network traffics Vs the maintenance cost

b 0
Local matching in node 0 with no copies
16
Cobas Merging
Data Schema Attribute I 0,1) Attribute J
1,100 Attribute K 0,50
  • Before merging, there are d one-dimensional
    CobasTrees.
  • A content is forwarded to these one-dimensional
    CobasTrees with d copies.
  • Thus we need to reinsert the filters of some
    CobasTree into other CobasTrees.

Attribute J
Attribute I
Attribute K
Fij Igt0.1, Jlt5
Fjk 30gtkgt20, Jgt10
FjJ 10
Filters having Attribute I J
Filters having Attribute K J
Fj 1gt Igt0, J 10
Filters only having Attribute J
Domain range of Attribute I.
17
Outline
  • Motivation
  • Background
  • Interval Tree a geometric data structure
  • Mercury a structured P2P supporting range query
  • Cobas Framework
  • CobasTree
  • 3 techniques
  • Selective multicast
  • Interval division
  • Merging CobasTrees
  • Performance Study
  • Conclusion Future work

18
Simulation Results
Publisher content side
Subscriber filter side
high popular
high popular dense
high dense
co-occurrence
No co-occurrence
19
Cobas Experiment
  • Cobas without merging too many content copies (
    D)
  • Cobas without division too many long intervals ?
    producing high load in the root ? many nodes
    responsible for such interval.
  • DRTree suffer from the curse of high
    dimensionality

20
Simulation Results
  • Count with D P2P instance, so caching in count
    may achieve the most lookup reduction, but still
    highest traffics
  • RTree with only 1 P2P instance, with the least
    lookup messages
  • Cobas with caching in-between RTree and Count

Caching is useful to reduce the lookup hops in
Mercury P2P.
21
Simulation Results
  • The distribution of Storage load is relatively
    even, all inside upper, lower
  • The matching load of a very few pnodes are high,
    most is inside the balancing range upper, lower
  • When no caching, the load is unbalanced.
  • More maintenance cost when
  • more nodes fail
  • More filters insertion/deletion

22
PlanetLab Results
  • Latency of Cobas may decrease by time
  • Latency of RTree and Count is relatively high.
  • Content Forwarding Cost Cost(C) may decrease due
    to merging cobastrees and interval division
  • Filter Maintenance Cost Cost(F) may increase due
    to dividing more intervals

23
Conclusion and Future Work
  • Cobas Framework
  • A new Data Structure CobasTree
  • 3 techniques
  • Selective multicast
  • interval Division
  • CobasTree Merging
  • Future work
  • Selectivity filtering ? stateful filtering
  • A single Publication Source ? Multiple Sources
Write a Comment
User Comments (0)
About PowerShow.com