CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions - PowerPoint PPT Presentation

About This Presentation
Title:

CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions

Description:

flexibility node insertion usually doesn't require recomputation of tree nodes. ... B -tree using document identifier (did) as a key. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 29
Provided by: amnons8
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions


1
CS 561 Presentation Indexing and
Querying XML Data for Regular Path Expressions
  • A Paper by Quanzhong Li and Bongki Moon
  • Presented by Ming Li

2
Our Objective
  • Developing a system that will enable us to
    perform XML data queries efficiently.

3
XML Queries Languages
  • Used for retrieving data from XML files.
  • Use a regular path expression syntax.
  • e.g. XPath, XQuery.

4
Queries Today - Inefficient
  • Usually XML tree traversals Inefficient.
  • Top-Down Approach
  • Bottom-Up Approach
  • An example
  • the query
  • /chapter/_/figure
  • (finding all figures in all chapters.)

5
Our Objective - Refined
  • Developing a system that will enable us to
    perform XML data queries efficiently
  • Developing such a system consists of
  • Developing a way to efficiently store XML data.
  • Developing efficient algorithms for processing
    regular path expressions (e.g. XQuery
    expressions).

6
Storing XML Documents - XISS
  • XISS - XML Indexing and Storage System.
  • Provides us with ways to
  • efficiently find all elements or attributes with
    the same name string grouped by document which
    they belong to.
  • quickly determine the ancestor-descendant
    relationship between elements and/or attributes
    in the hierarchy of XML data hierarchy.

7
Determining Ancestor-Descendent Relationship
  • According to Dietzs for two given nodes x and y
    of a tree T, x is an ancestor of y iff x occurs
    before y in the preorder traversal and after y in
    the postorder traversal.
  • Example

8
Determining Ancestor-Descendent Relationship
cont.
  • Advantage the ancestor-descendent relationship
    can be determined in constant time.
  • Disadvantage a lack of flexibility.
  • e.g. inserting a new node requires recomputation
    of many tree nodes.

9
Determining Ancestor-Descendent Relationship
cont.
  • A new numbering scheme
  • Each node is associated with a ltorder, sizegt
    pair
  • For a tree node y and its parent x
  • order(y), order(y) size(y) Ì (order(x),
    order(x) size(x)
  • For two sibling nodes x and y, if x is the
    predecessor of y in preorder traversal holds
  • order(x) size(x) lt order(y).

10
Determining Ancestor-Descendent Relationship
cont.
  • Fact for two given nodes x and y of a tree T, x
    is an ancestor of y iff
  • order(x) lt order(y) order(x) size(x)

11
Determining Ancestor-Descendent Relationship
cont.
  • Properties
  • the ancestor-descendent relationship can be
    determined in constant time.
  • flexibility node insertion usually doesnt
    require recomputation of tree nodes.
  • an element can be uniquely identified in a
    document by its order value.

12
XISS System Overview
13
Name Index and Value Table
  • Objective minimizing the storage and computation
    overhead by eliminating replicated strings and
    string comparisons.
  • Name Index - mapping distinct name strings into
    unique name identifiers (nid).
  • Value Table - mapping distinct value strings
    (i.e. attribute value and text value) into unique
    value identifiers (vid).
  • Both implemented as a B-tree.

14
The Element Index
  • Objective quickly finding all elements with the
    same name string.
  • Structure

15
The Attribute Index
  • Objective quickly finding all elements with the
    same name string.
  • Structure
  • Same structure as the Element Index except that
    the record in attribute index has a value
    identifier vid which is a key used to obtain the
    attribute from the value table.

16
The Structure Index
  • Objectives
  • Finding the parent element and child elements (or
    attributes) for a given element.
  • Finding the parent element for a given attribute.
  • Structure

17
The Structure Index cont.
  • Structure
  • B-tree using document identifier (did) as a key.
  • Leaf nodes linear arrays with records for all
    elements and attributes from an XML document.
  • Each record nid, ltorder,sizegt, Parent order,
    Child order, Sibling order, Attribute order.
  • Records are ordered by order value.

18
Querying Method
  • Decomposing path expressions into simple path
    expressions.
  • Applying algorithms on simple path expressions
    and their intermediate results.

19
Decomposition of Path Expressions
  • The main idea
  • A complex path expression is decomposed into
    several simple path expressions.
  • Each simple path expression produces an
    intermediate result that can be used in the
    subsequent stage of processing.
  • The results of the simple path expressions are
    than combined or joined together to obtain the
    final result of the given query.

20
Basic Subexpressions - Example
Decomposition of (E1/E2)/ E3 / ((E4_at_aV)
(E5/_/E6))
21
Example EA-Join Element and Attribute Join
22
EA-Join Element and Attribute Join
Input E1,,Em Ei is a set of elements
having a common document identifier
(did) A1,,An Aj is a set of elements having
a common document identifier (did) Output A
set of (e,a) pairs such that the element e is the
parent of the attribute a.
23
EA-Join Element and Attribute Join
The Algorithm // Sort-merge Ei and Aj by
did. (1) foreach Ei and Aj with the same did
do // Sort-merge Ei and Aj by //
PARENT-CHILD relationship (2) foreach e Î Ei and
a Î Aj do (3) if (e is a parent of a) then
output (e,a) end end
24
EA-Join Example
  • Consider the XML document
  • ltEle AttA1gt
  • ltEle AttA2gt lt/Elegt
  • lt/Elegt
  • And the query /Ele_at_AttA1

25
EA-Join Querying /Ele_at_AttA1
  • ltEle AttA1gt
  • ltEle AttA2gt lt/Elegt
  • lt/Elegt
  • Sort-merging Eles and Atts by parent-child
    relation ship will give us the list
  • lt1,3gt, lt2,0gt, lt3,1gt, lt4,0gt
  • Finding the elements Eles with a child
    attribute Att with a value A1 from the
    accepted list is easy using the information in
    the Element Record.

26
EA-Join Comments
  • Only a two-stage sort-merge operation without
    additional cost of sorting
  • First merge by did.
  • Second merge by examining parent-child
    relationship.
  • This merge is based on the order values of the
    element and attribute as defined by the numbering
    scheme.
  • Attributes should be placed before their sibling
    elements in the order of the numbering scheme.
  • guarantees that elements and attributes with the
    same did can be merged in a single scan.

27
Conclusions
  • XISS can efficiently process regular path
    expression queries.
  • Performance improvement over the conventional
    methods by up to an order of magnitude.
  • Future workoptimal page size or the break-even
    point between the two criteria.

28
Thank you so much!
Write a Comment
User Comments (0)
About PowerShow.com