Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison

Description:

Ex: ZigZag algorithm. Unnest Algorithm. Takes as input a path ... Cost Model of ZigZag Join Algorithm ... ZigZag Join operator takes posting lists as input ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 49
Provided by: cmpeBo
Category:

less

Transcript and Presenter's Notes

Title: Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison


1
Mixed Mode XML Query ProcessingHalverson,
Burger, GalanisUniversity of Wisconsin, Madison
Emre Tapçi 2002701525
  • Can Tamyaman
  • 2002700905

2
Overview
  • Introduction
  • Purpose of the article
  • System Architecture
  • Mixed Mode Query Processing
  • Experiments
  • Related Work
  • Conclusion and Future Work

3
Introduction
  • Mixed mode XML query processing employs
  • inverted list filtering
  • tree navigation

4
Purpose of the article
  • To show that systems which keep inverted list
    filtering and tree navigation seperately are
    suboptimal. To build an optimal system, these two
    types of processing must be integrated.

5
Basic System Architecture
6
System Architecture
  • The Data Manager stores a tree representation of
    the XML document
  • The Index Manager stores a set of inverted lists,
    mapping objects in the XML document to lists of
    exact locations within the document

7
Numbering Scheme
  • The data manager and index manager must share a
    common scheme for numbering the elements in an
    XML document

8
Numbering Scheme
9
Data Manager
  • Each XML document is stored in the Data Manager
    using a B tree structure.
  • The key of B tree index is a (document_ID,element
    _ID) pair that we refer to as an XKey.

10
Data Manager
  • Each leaf entry contains
  • Term ID
  • Record ID (RID)

11
Data Manager
  • The B tree corresponding to the previously given
    numbered XML document example

12
Data Manager
  • Data Manager Tree Structure

13
Data Manager
  • Child Axis Cursor (CA)
  • Descendent Axis Cursor (DA)

14
Index Manager
  • Indexing information is stored in a two level
    index structure
  • B tree as top level
  • Second level info is referred as postings, where
    postings make up a posting list

15
Index Manager
  • Index Manager Tree Structure

16
Mixed Mode Query Processing
  • Data Manager
  • Navigation Based Algorithms
  • Ex Unnest Algorithm
  • Index Manager
  • Multi-predicate merge join algorithms
  • Ex ZigZag algorithm

17
Unnest Algorithm
  • Takes as input a path expression and a stream of
    XKeys.
  • Evaluates the path expression for each XKey in
    the input, and outputs XKeys corresponding to the
    satisfying elements.
  • Ex document()/A/B/C

18
Unnest Algorithm
  • Uses a Finite State Machine (FSM) to evaluate
    path expressions
  • Each state of the FSM represents having satisfied
    some prefix of the path expression, while an
    accepting state indicates a full match.

19
Unnest Algorithm
  • Each state is associated with a cursor that
    corresponds to the next step to be applied for
    the path expression.
  • For each XKey obtained from the cursor, make the
    appropriate transition in the FSM, and continue
    with the next XKey in the next state.
  • If the cursor terminates, return to the previous
    state and continue by enumerating its cursor.

20
Unnest FSM for A/B
21
Unnest DA-FSM for A/B
22
Unnest CA-FSM for A//B
23
Cost Model For Unnest Algorithm
  • There are two relevant cost formulas
  • Cost of a child axis unnest
  • Cost of a descendant axis unnest

24
The Cost of Unnest
25
ZigZag Join Algorithm
  • Uses the indices present on the posting lists.
  • These algorithms assume that the posting lists
    are sorted in order by
  • (document ID,Start number)

26
ZigZag Join Algorithm
  • The point of the algorithm is to skip forward
    over parts of a posting list that are guaranteed
    not to have any matching postings on the other
    list.

27
ZigZag Join Algorithm
  • Example

28
ZigZag Join Algorithm
  • Check the containment of the first B within the
    first A, and output the pair.
  • Increment the B posting list pointer
  • Find that the second B is not contained by the
    first A
  • Increment the A posting list pointer.
  • If a second A is beyond the second B then
    increment B posting list pointer

29
ZigZag Join Algorithm
  • Since the current B posting has no A posting
    matches, use the second level index to seek
    forward using the current A postings start
    number
  • Then it skips over to the fifth B posting.

30
Cost Model of ZigZag Join Algorithm
  • The CPU cost can be quite dependent on actual
    document structure, because the algorithm can
    skip over sections of either input posting list
    and can backtrack in a complex fashion

31
Cost Model of ZigZag Join Algorithm
32
Enabling Mixed Mode Execution
  • Unnest operator takes a list of XKeys as input
  • ZigZag Join operator takes posting lists as input
  • To enable query plans that use a mixture of these
    two operators, we must provide efficient
    mechanisms for switching between the two formats.

33
Enabling Mixed Mode Execution
  • To convert postings into XKeys
  • Remove the end number and level

34
Enabling Mixed Mode Execution
  • To convert the XKeys into postings
  • We need to look up an end number and level
  • To support this operation we store the end number
    and level in the information record for each
    element

35
Selecting A Plan
  • We heuristically limit our search space to
    include only left deep evaluation plans for
    structural joints.
  • To choose the best plan, we use a dynamic
    programming approach

36
Selecting A Plan
37
Selecting A Plan
  • For a path expression query, the cost can be
    expressed as the sum of the last operation and
    the minimum cost for the rest of the last
    operation and the minimum cost

38
Experiments
  • Experimental results of Mixed Mode Query
    Processing Approach
  • Carried out on a dual processor 550 MHz P3 PC
    running Redhat Linux 6.2, 1GB memory and SCSI
    disks.

39
Experiments
  • The XML Schema used in the experiments

40
Experiments
  • Test queries with predicted optimal plans

41
Experiment Results
  • Execution times in miliseconds for Query1

Execution times in miliseconds for Query2
42
Experiment Results
Execution times in miliseconds for Query3
  • Execution times in miliseconds for Query4

43
Related Work
  • There has been a lot of work on developing
    efficient algorithms for structural joins that
    identify occurences of structural relationships
  • Using pre-order and post-order numbers
  • Stack-merge algorithms to make use of B-tree
    indices on the inverted lists

44
Related Work
  • There has also been some work on converting path
    expression queries into state machines
  • Several algorithms were proposed for optimizing
    branching path expressions in the navigational
    access methods only.

45
Related Work
  • Some recent research studies have considered the
    problem of maintaining summary structures of XML
    documents to provide statistics information
  • XML management systems have been also built on
    top of either relational or object-oriented
    systems.

46
Conclusion and Future Work
  • Mixed mode XML is better than other query
    processing techniques considered seperately.
  • Only single axis paths were considered in ZigZag
    Join algorithm more complex algorithms could be
    integrated.

47
Conclusion and Future Work
  • Parallel execution of operators is supported in
    the system, so we can benefit from branched
    execution plans
  • This work uses only simple models and examples
    extensions might be done to include more complex
    queries and cost models

48
Thank You
  • We thank you all for listening
  • can.tamyaman_at_arcelik.com
  • tapci_at_alumni.bilkent.edu.tr
Write a Comment
User Comments (0)
About PowerShow.com