Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison

Description:

Ex: ZigZag algorithm. Unnest Algorithm. Takes as input a path ... Cost Model of ZigZag Join Algorithm ... ZigZag Join operator takes posting lists as input ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 49

Provided by: cmpeBo

Category:

more less

Transcript and Presenter's Notes

Title: Mixed Mode XML Query Processing Halverson, Burger, Galanis University of Wisconsin, Madison

1
Mixed Mode XML Query ProcessingHalverson,
Burger, GalanisUniversity of Wisconsin, Madison
Emre Tapçi 2002701525

Can Tamyaman
2002700905

2
Overview

Introduction
Purpose of the article
System Architecture
Mixed Mode Query Processing
Experiments
Related Work
Conclusion and Future Work

3
Introduction

Mixed mode XML query processing employs
inverted list filtering
tree navigation

4
Purpose of the article

To show that systems which keep inverted list
filtering and tree navigation seperately are
suboptimal. To build an optimal system, these two
types of processing must be integrated.

5
Basic System Architecture
6
System Architecture

The Data Manager stores a tree representation of
the XML document
The Index Manager stores a set of inverted lists,
mapping objects in the XML document to lists of
exact locations within the document

7
Numbering Scheme

The data manager and index manager must share a
common scheme for numbering the elements in an
XML document

8
Numbering Scheme
9
Data Manager

Each XML document is stored in the Data Manager
using a B tree structure.
The key of B tree index is a (document_ID,element
_ID) pair that we refer to as an XKey.

10
Data Manager

Each leaf entry contains
Term ID
Record ID (RID)

11
Data Manager

The B tree corresponding to the previously given
numbered XML document example

12
Data Manager

Data Manager Tree Structure

13
Data Manager

Child Axis Cursor (CA)
Descendent Axis Cursor (DA)

14
Index Manager

Indexing information is stored in a two level
index structure
B tree as top level
Second level info is referred as postings, where
postings make up a posting list

15
Index Manager

Index Manager Tree Structure

16
Mixed Mode Query Processing

Data Manager
Navigation Based Algorithms
Ex Unnest Algorithm
Index Manager
Multi-predicate merge join algorithms
Ex ZigZag algorithm

17
Unnest Algorithm

Takes as input a path expression and a stream of
XKeys.
Evaluates the path expression for each XKey in
the input, and outputs XKeys corresponding to the
satisfying elements.
Ex document()/A/B/C

18
Unnest Algorithm

Uses a Finite State Machine (FSM) to evaluate
path expressions
Each state of the FSM represents having satisfied
some prefix of the path expression, while an
accepting state indicates a full match.

19
Unnest Algorithm

Each state is associated with a cursor that
corresponds to the next step to be applied for
the path expression.
For each XKey obtained from the cursor, make the
appropriate transition in the FSM, and continue
with the next XKey in the next state.
If the cursor terminates, return to the previous
state and continue by enumerating its cursor.

20
Unnest FSM for A/B
21
Unnest DA-FSM for A/B
22
Unnest CA-FSM for A//B
23
Cost Model For Unnest Algorithm

There are two relevant cost formulas
Cost of a child axis unnest
Cost of a descendant axis unnest

24
The Cost of Unnest
25
ZigZag Join Algorithm

Uses the indices present on the posting lists.
These algorithms assume that the posting lists
are sorted in order by
(document ID,Start number)

26
ZigZag Join Algorithm

The point of the algorithm is to skip forward
over parts of a posting list that are guaranteed
not to have any matching postings on the other
list.

27
ZigZag Join Algorithm

Example

28
ZigZag Join Algorithm

Check the containment of the first B within the
first A, and output the pair.
Increment the B posting list pointer
Find that the second B is not contained by the
first A
Increment the A posting list pointer.
If a second A is beyond the second B then
increment B posting list pointer

29
ZigZag Join Algorithm

Since the current B posting has no A posting
matches, use the second level index to seek
forward using the current A postings start
number
Then it skips over to the fifth B posting.

30
Cost Model of ZigZag Join Algorithm

The CPU cost can be quite dependent on actual
document structure, because the algorithm can
skip over sections of either input posting list
and can backtrack in a complex fashion

31
Cost Model of ZigZag Join Algorithm
32
Enabling Mixed Mode Execution

Unnest operator takes a list of XKeys as input
ZigZag Join operator takes posting lists as input
To enable query plans that use a mixture of these
two operators, we must provide efficient
mechanisms for switching between the two formats.

33
Enabling Mixed Mode Execution

To convert postings into XKeys
Remove the end number and level

34
Enabling Mixed Mode Execution

To convert the XKeys into postings
We need to look up an end number and level
To support this operation we store the end number
and level in the information record for each
element

35
Selecting A Plan

We heuristically limit our search space to
include only left deep evaluation plans for
structural joints.
To choose the best plan, we use a dynamic
programming approach

36
Selecting A Plan
37
Selecting A Plan

For a path expression query, the cost can be
expressed as the sum of the last operation and
the minimum cost for the rest of the last
operation and the minimum cost

38
Experiments

Experimental results of Mixed Mode Query
Processing Approach
Carried out on a dual processor 550 MHz P3 PC
running Redhat Linux 6.2, 1GB memory and SCSI
disks.

39
Experiments

The XML Schema used in the experiments

40
Experiments

Test queries with predicted optimal plans

41
Experiment Results

Execution times in miliseconds for Query1

Execution times in miliseconds for Query2
42
Experiment Results
Execution times in miliseconds for Query3

Execution times in miliseconds for Query4

43
Related Work

There has been a lot of work on developing
efficient algorithms for structural joins that
identify occurences of structural relationships
Using pre-order and post-order numbers
Stack-merge algorithms to make use of B-tree
indices on the inverted lists

44
Related Work

There has also been some work on converting path
expression queries into state machines
Several algorithms were proposed for optimizing
branching path expressions in the navigational
access methods only.

45
Related Work

Some recent research studies have considered the
problem of maintaining summary structures of XML
documents to provide statistics information
XML management systems have been also built on
top of either relational or object-oriented
systems.

46
Conclusion and Future Work

Mixed mode XML is better than other query
processing techniques considered seperately.
Only single axis paths were considered in ZigZag
Join algorithm more complex algorithms could be
integrated.

47
Conclusion and Future Work

Parallel execution of operators is supported in
the system, so we can benefit from branched
execution plans
This work uses only simple models and examples
extensions might be done to include more complex
queries and cost models

48
Thank You