Title: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks
1Subscription Partitioning and Routing in
Content-based Publish/Subscribe Networks
- Yi-Min Wang, Lili Qiu, Dimitris Achlioptas,
Gautam Das, Paul Larson, and Helen J. Wang - Microsoft Research
- DISC 2002
- Toulouse, France
2Motivation
- Phenomenal growth in Web usage
- Future trends
- Switch from polling to notifications
- Example stock quotes, sports scores, weather,
news, - Yahoo! Alerts, MSN Mobile, AOL anywhere,
InfoSpace, - Complements the traditional polling model in Web
- Event Distribution Network (EDN)
- Distributed and scalable event distribution
- Parallel the idea of Content Distribution Network
(CDN) for event distribution - Built on top of a self-configuring overlay
network of servers - Content-based publish/subscribe systems through
in-network processing of aggregated subscription
filters
3Dispatcher-based model
4Model of Content-based Pub/Sub
- Content-based filtering/routing
- Event schema with d attributes, supporting
equality and range predicates - Event a point in the ddimensional space
- Subscription a rectangle in that space
- Match a rectangle contains the point
5Subscription Partitioning
- Basic idea similarity-based clustering for
reducing total event traffic - Event Space Partitioning (ESP)
- Filter Set Partitioning (FSP)
6Equality Predicates
- Hash predicates to get uniform distribution
- Treat the hashed domain as the event space
- Use Event Space Partitioning
- Subscription is a point does not intersect
multiple sub-spaces - Use over-partitioning for better load balancing
- Use offline greedy algorithm to assign buckets to
servers for load balancing - Use indirection table to dynamically map buckets
to servers for load re-balancing - Use bloom filters to further reduce traffic
- Fast detection of true negatives at the expense
of (very low) false-positive rate
7Simulation Results
- Actual Notification Money log
- 1.48M subscriptions with 0.29M unique filters
over 21,741 stock symbols - Zipf-like distribution
8Simulation Results (Cont.)
- Simulate 100M new subscriptions from 43,734
symbols - Scaled-up Zipf-like distribution
- Perturbation and permutation
- Uniform distribution
- 50 servers with over-partitioning ratio 10
- Without load re-balancing
- Load imbalance (max/min) ranged from 1.41 to 6.66
(Uniform case) - With imbalance threshold of 2.0
- Re-balancing was triggered only 5 times, each
time involving re-assignment of up to 3 buckets
and migration of up to 0.7 subscriptions.
9Range Predicates
- Use Filter Set Partitioning
- K-Mean clustering
- Use center point to represent a rectangle
- R-tree-based clustering
- R-tree dynamic index structure for
multi-dimensional data rectangles - Offline R-tree algorithm
- Exhaustively and recursively search for
partitions that minimize sum of bounding
rectangle volumes - Online R-tree algorithm
- Insert from root down the path that greedily
minimizes the increase in bounding rectangle
volume - Simulation results
- Off-line R-tree gt On-line R-tree gt K-Mean gt
Random
10Related Work
- Pub/Sub systems
- Echo, Elvin, Gryphon, Herald, Hierarchical Proxy
Architecture, Information Bus, JEDI, Keryx,
Ready, Scribe, Siena, - Clustering in the pub/sub
- All the previous work focus on reducing
multicast groups OAA00, RLW02, WKM00
11Summary
- Proposed two subscription partitioning and
routing approaches - Event Space Partitioning
- Filter Set Partitioning
- Evaluated performance via simulations
- Subscription partitioning reduces network traffic
- Over-partitioning helps to achieve good load
balancing dynamically - Bloom filter further reduces event traffic
12Simulation Results
- 10,000 random subscriptions per server on average
- Offline R-tree performs the best reduces event
traffic by 20 to 60
13EDN Network Architecture
- Submit subscriptions
- Subscription routing
- Content-based route updates
- Peer exchange of route updates
- Content-based event routing
- Notification delivery
Event Src.
5
EDN nodes
3
3
2
5
4
1
Notification Routing Services
6
subscriber
14 15- Optimize various performance metrics, subject to
load-balancing constraints - Minimize total event traffic
- Volume of union of rectangles
- Maximize overall system throughput
- Minimize end-to-end latency
Subscription rectangles
16The EDN Optimization Problem
Centralized Architecture
Distributed Architecture
Event Sources
Notification Routing Service
Server
Subscribers
17Three Research Directions
- Theoretical Study
- Optimal or approximation algorithms for
simplified versions - System Design and Simulation
- Subscription partitioning for reducing event
traffic - Summary-based routing for enhancing system
throughput - Indigo-based Implementation
- Extensible routing pub/sub architecture
18An R-tree-based EDN pub/sub system
19System Design and SimulationSummary-based
Routing
- Basic idea summary precision-based load
balancing for enhancing system throughput
20- If dispatcher is not the bottleneck, use precise
summary. - Otherwise, reduce summary precision until either
the outgoing link or the servers are about to
become the bottleneck. - Throughput increasing
- Further reduction of summary precision would
generate excessive false-positive traffic to
throttle back the dispatcher - Throughput decreasing
21Simulation results
- Imprecise summaries enhance throughput
22- Imprecise summaries combined with R-tree-based
partitioning further enhance throughput
23- Dispatcher-to-link and dispatcher-to-sever
bottleneck ratios
24EDN on Herald
- Piggyback subscription routing summary
reporting on multicast tree forming process - Need to additionally consider notification
traffic (because subscribers are now part of
multicast tree)
Subscription Routing
Subscriber
25Indigo-based Implementation
- Indigo M2 routing pub/sub architecture was not
extensible - EDN used M2 messaging and built a WS-compliant,
extensible routing pub/sub architecture on top
of it - Close collaboration with Indigo
- Extensibility proposals to Indigo
- Some appeared in M3
- But most sealed for security for now
- Some being considered for M4
26EDN Extensible Routing and Pub/Sub
Namespace Binding Layer
EDN Route Manager
EDN Subscription Manager
MS Route Manager
WS-Eventing Subscription Manager
WS-Routing Route Manager
EDN R-tree Matcher
XPath Filter Matcher
Indigo Messaging
27Other XML-Messaging/Indigo interactions
- State dependency management
- Design tool for new features involving state
transplant - E.g., System Restore (across time), Intellimirror
(across space) - Repair tool providing consistent undo
- System Restore rollback of atomic units
- GoBack3 roll-forward of atomic units
- Troubleshooting tool
- Trace-diff state-diff approaches
- Our automatic, bottom-up, black-box discovery
approach complements their manual, top-down,
logical declaration approach (TravisM) - Install-time and run-time information augments
the authoring-time information - Targeted problem spaces help identify things to
declare for manageability