RAINDROP: XML Stream Processing Engine - PowerPoint PPT Presentation

About This Presentation
Title:

RAINDROP: XML Stream Processing Engine

Description:

A pattern can be retrieved inside the automaton or outside the automaton ... Raindrop integrates automaton and 'DOM' navigation into one algebraic framework. ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 26
Provided by: davi119
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: RAINDROP: XML Stream Processing Engine


1
RAINDROP XML Stream Processing Engine
  • Murali Mani, WPI
  • _at_UPenn, DB seminar
  • June 08, 2006

Partially Supported by NSF grant IIS 0414567
2
Acknowledgements
  • NSF for the financial support
  • Joint work with several others
  • Prof. Elke A. Rundensteiner
  • Graduate students Hong Su, Ming Li, Mingzhu
    Wei, Shoushen Wang, Jinhui Jian
  • Undergraduate students Drew Ditto, Bogomil
    Tselkov

3
Applications
  • Need for efficient stream data processing
  • Monitor patient data in real time
  • Sensor networks fire detection battle field
    deployment traffic congestion
  • Others news delivery, monitor network traffic,

4
XML Stream Processing
ltopen_auctionsgt ltauctiongt
ltprivacygtNolt/privacygt ltdescriptiongt
Calendar of ltemphgtFrench
Impressionismlt/emphgtbyltemphgtMonet lt/emphgt
lt/descriptiongt ltinitialgt 20
lt/initialgt lt/auctiongt
5
Option 1 Automata-Based Pattern Retrieval
  • Additional Data Structures for
  • Buffering
  • Filtering
  • Restructuring

When patterns are retrieved depends on the data
6
Option 2 DOM Based Pattern Retrieval
When patterns are retrieved depends on other
patterns
7
Which paradigm is better?
Minimal pushdown plans win over maximal pushdown
when selectivity lt 50
8
Problem
  • How to provide the framework to choose between
    these paradigms?
  • Model both paradigms uniformly as algebraic
    operators.
  • Use a cost model to choose optimal plan given
    data statistics.

9
Automaton as TokenNav
StructuralJoin a
Select eFrench
Select non-empty(b)
Extract a
Extract b
Extract e
TokenNav a, /privacy-gtb
TokenNav a,/desc/emph-gte
TokenNav s, /auctions/auction-gta
10
DOM Navigation as NodeNav
Select eFrench
Select non-empty(b)
NodeNav a, /privacy-gtb
NodeNav a,/desc/emph-gte
Extract a
TokenNav s, /auctions/auction-gta
11
Exploring the Search Space
  • A pattern can be retrieved inside the automaton
    or outside the automaton
  • However there are dependencies
  • for a in /a,
  • b in a/,
  • c in b/
  • NodeNav for b gt NodeNav for c
  • TokenNav for b gt TokenNav/NodeNav for c

12
Run-time Optimization
  • Statistics unknown before data arrives
  • Statistics could change over time
  • We need techniques for efficient statistics
    monitoring, search space exploration and plan
    migration (safe points for migration)

13
Run-time Optimization
statistics
Query plan executor
Stream
  • Create an initial plan
  • Run initial plan and collect statistics at same
    time
  • Generate new plan using statistics collected
  • Pause receiving stream
  • Migrate to new plan
  • Resume receiving stream

Query Optimizer
New Query plan
Plan Migrator
14
Executing a Raindrop Plan
15
Key Ideas
  • Minimum Memory requirements
  • Discard data early
  • Output data early

16
In-Time Structural Join
StructuralJoin a
Select eFrench
Select non-empty(b)
Extract a
Extract b
Extract e
TokenNav a, /privacy-gtb
TokenNav a,/desc/emph-gte
TokenNav s, /auctions/auction-gta
17
Better than In-Time Structural Join
StructuralJoin r
Extract b
Extract a
a
TokenNav r, /a-gta
b
TokenNav r, /b-gtb
a tokens need not be stored
TokenNav s, /root-gtr
18
Evaluating Predicates
StructuralJoin r
Extract b
Select avalue
a
Extract a
b
TokenNav r, /b-gtb
TokenNav r, /a-gta
Once avalue is satisfied, b tokens need not
be stored
TokenNav s, /root-gtr
19
Using schema knowledge
root -gt (a, b)
StructuralJoin a
Extract b
Extract a
a
TokenNav r, /a-gta
b
TokenNav r, /b-gtb
a, b tokens need not be stored
TokenNav s, /root-gtr
20
Using Schema Knowledge for Predicates
root -gt (b, a, c)
StructuralJoin r
Extract b
Select avalue
a
Extract a
b
TokenNav r, /b-gtb
TokenNav r, /a-gta
Once c is seen and avalue is not yet
satisfied, b tokens can be discarded
TokenNav s, /root-gtr
21
Conclusions
  • Raindrop integrates automaton and DOM
    navigation into one algebraic framework.
  • Cost-based optimization possible.
  • Execution minimizes memory requirements.

22
Ongoing Work
  • Load shedding in XML stream processing.
  • Utilizing Dynamic schema changes for optimization.

23
Fragment of XQuery supported
  • FLWR expressions (no conditionals/user defined
    functions)
  • Path expressions use only forward axes (child,
    descendant, descendant or self, attribute)
  • Predicates supported are of the form pathExpr
    relOp constant

24
Issues with correlated queries
  • for r in /root
  • return
  • ltrootgt
  • for a in r/a
  • return
  • ltagtr/blt/agt
  • lt/rootgt

25
  • Visit our XQuery engine over XML stream
    project (RAINDROP) website
  • http//davis.wpi.edu/dsrg/raindrop/
Write a Comment
User Comments (0)
About PowerShow.com