Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Description:

1. Subscription Partitioning and Routing in Content-based ... Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, ... EDN on Herald ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: Micro263
Category:

less

Transcript and Presenter's Notes

Title: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks


1
Subscription Partitioning and Routing in
Content-based Publish/Subscribe Networks
  • Yi-Min Wang, Lili Qiu, Dimitris Achlioptas,
    Gautam Das, Paul Larson, and Helen J. Wang
  • Microsoft Research
  • DISC 2002
  • Toulouse, France

2
Motivation
  • Phenomenal growth in Web usage
  • Future trends
  • Switch from polling to notifications
  • Example stock quotes, sports scores, weather,
    news,
  • Yahoo! Alerts, MSN Mobile, AOL anywhere,
    InfoSpace,
  • Complements the traditional polling model in Web
  • Event Distribution Network (EDN)
  • Distributed and scalable event distribution
  • Parallel the idea of Content Distribution Network
    (CDN) for event distribution
  • Built on top of a self-configuring overlay
    network of servers
  • Content-based publish/subscribe systems through
    in-network processing of aggregated subscription
    filters

3
Dispatcher-based model
4
Model of Content-based Pub/Sub
  • Content-based filtering/routing
  • Event schema with d attributes, supporting
    equality and range predicates
  • Event a point in the ddimensional space
  • Subscription a rectangle in that space
  • Match a rectangle contains the point

5
Subscription Partitioning
  • Basic idea similarity-based clustering for
    reducing total event traffic
  • Event Space Partitioning (ESP)
  • Filter Set Partitioning (FSP)

6
Equality Predicates
  • Hash predicates to get uniform distribution
  • Treat the hashed domain as the event space
  • Use Event Space Partitioning
  • Subscription is a point does not intersect
    multiple sub-spaces
  • Use over-partitioning for better load balancing
  • Use offline greedy algorithm to assign buckets to
    servers for load balancing
  • Use indirection table to dynamically map buckets
    to servers for load re-balancing
  • Use bloom filters to further reduce traffic
  • Fast detection of true negatives at the expense
    of (very low) false-positive rate

7
Simulation Results
  • Actual Notification Money log
  • 1.48M subscriptions with 0.29M unique filters
    over 21,741 stock symbols
  • Zipf-like distribution

8
Simulation Results (Cont.)
  • Simulate 100M new subscriptions from 43,734
    symbols
  • Scaled-up Zipf-like distribution
  • Perturbation and permutation
  • Uniform distribution
  • 50 servers with over-partitioning ratio 10
  • Without load re-balancing
  • Load imbalance (max/min) ranged from 1.41 to 6.66
    (Uniform case)
  • With imbalance threshold of 2.0
  • Re-balancing was triggered only 5 times, each
    time involving re-assignment of up to 3 buckets
    and migration of up to 0.7 subscriptions.

9
Range Predicates
  • Use Filter Set Partitioning
  • K-Mean clustering
  • Use center point to represent a rectangle
  • R-tree-based clustering
  • R-tree dynamic index structure for
    multi-dimensional data rectangles
  • Offline R-tree algorithm
  • Exhaustively and recursively search for
    partitions that minimize sum of bounding
    rectangle volumes
  • Online R-tree algorithm
  • Insert from root down the path that greedily
    minimizes the increase in bounding rectangle
    volume
  • Simulation results
  • Off-line R-tree gt On-line R-tree gt K-Mean gt
    Random

10
Related Work
  • Pub/Sub systems
  • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy
    Architecture, Information Bus, JEDI, Keryx,
    Ready, Scribe, Siena,
  • Clustering in the pub/sub
  • All the previous work focus on reducing
    multicast groups OAA00, RLW02, WKM00

11
Summary
  • Proposed two subscription partitioning and
    routing approaches
  • Event Space Partitioning
  • Filter Set Partitioning
  • Evaluated performance via simulations
  • Subscription partitioning reduces network traffic
  • Over-partitioning helps to achieve good load
    balancing dynamically
  • Bloom filter further reduces event traffic

12
Simulation Results
  • 10,000 random subscriptions per server on average
  • Offline R-tree performs the best reduces event
    traffic by 20 to 60

13
EDN Network Architecture
  1. Submit subscriptions
  2. Subscription routing
  3. Content-based route updates
  4. Peer exchange of route updates
  5. Content-based event routing
  6. Notification delivery

Event Src.
5
EDN nodes
3
3
2
5
4
1
Notification Routing Services
6
subscriber
14
  • Backup Slides

15
  • Optimize various performance metrics, subject to
    load-balancing constraints
  • Minimize total event traffic
  • Volume of union of rectangles
  • Maximize overall system throughput
  • Minimize end-to-end latency

Subscription rectangles
16
The EDN Optimization Problem
Centralized Architecture
Distributed Architecture
Event Sources
Notification Routing Service
Server
Subscribers
17
Three Research Directions
  • Theoretical Study
  • Optimal or approximation algorithms for
    simplified versions
  • System Design and Simulation
  • Subscription partitioning for reducing event
    traffic
  • Summary-based routing for enhancing system
    throughput
  • Indigo-based Implementation
  • Extensible routing pub/sub architecture

18
An R-tree-based EDN pub/sub system
19
System Design and SimulationSummary-based
Routing
  • Basic idea summary precision-based load
    balancing for enhancing system throughput

20
  • If dispatcher is not the bottleneck, use precise
    summary.
  • Otherwise, reduce summary precision until either
    the outgoing link or the servers are about to
    become the bottleneck.
  • Throughput increasing
  • Further reduction of summary precision would
    generate excessive false-positive traffic to
    throttle back the dispatcher
  • Throughput decreasing

21
Simulation results
  • Imprecise summaries enhance throughput

22
  • Imprecise summaries combined with R-tree-based
    partitioning further enhance throughput

23
  • Dispatcher-to-link and dispatcher-to-sever
    bottleneck ratios

24
EDN on Herald
  • Piggyback subscription routing summary
    reporting on multicast tree forming process
  • Need to additionally consider notification
    traffic (because subscribers are now part of
    multicast tree)

Subscription Routing
Subscriber
25
Indigo-based Implementation
  • Indigo M2 routing pub/sub architecture was not
    extensible
  • EDN used M2 messaging and built a WS-compliant,
    extensible routing pub/sub architecture on top
    of it
  • Close collaboration with Indigo
  • Extensibility proposals to Indigo
  • Some appeared in M3
  • But most sealed for security for now
  • Some being considered for M4

26
EDN Extensible Routing and Pub/Sub
Namespace Binding Layer
EDN Route Manager
EDN Subscription Manager
MS Route Manager
WS-Eventing Subscription Manager
WS-Routing Route Manager
EDN R-tree Matcher
XPath Filter Matcher
Indigo Messaging
27
Other XML-Messaging/Indigo interactions
  • State dependency management
  • Design tool for new features involving state
    transplant
  • E.g., System Restore (across time), Intellimirror
    (across space)
  • Repair tool providing consistent undo
  • System Restore rollback of atomic units
  • GoBack3 roll-forward of atomic units
  • Troubleshooting tool
  • Trace-diff state-diff approaches
  • Our automatic, bottom-up, black-box discovery
    approach complements their manual, top-down,
    logical declaration approach (TravisM)
  • Install-time and run-time information augments
    the authoring-time information
  • Targeted problem spaces help identify things to
    declare for manageability
Write a Comment
User Comments (0)
About PowerShow.com