Managing XML and Semistructured Data - PowerPoint PPT Presentation

Loading...

PPT – Managing XML and Semistructured Data PowerPoint presentation | free to download - id: 134945-YTU3N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Managing XML and Semistructured Data

Description:

Query rewriting with schema. Resources. Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, ... E1[ancestor-or-self::E2] E1. Query ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 18
Provided by: csWash
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Managing XML and Semistructured Data


1
Managing XML and Semistructured Data
  • Lecture 15 Query Analysis

Prof. Dan Suciu
Spring 2001
2
In this lecture
  • Query rewriting
  • examples
  • Query rewriting with schema
  • Resources
  • Optimizing Regular Path Expressions Using Graph
    Schemas, M.Fernandez and D.Suciu, Data
    Engineering, 98
  • Query Optimization for Structured Documents Based
    on Knowledge on the Document Type Definition, K.
    Bohm, K. Gayer, K. Aberer, T. Özsu

3
Query Analysis
  • Generic term to describe
  • Query rewriting based on schema information
  • Query containment and minimization

4
Query Rewriting
  • Problem
  • Given a query Q
  • Regular path expression
  • Or more complex Xquery expression
  • Given a schema S
  • graph schema
  • DTD
  • XML-Schema
  • Rewrite Q to some QS s.t.
  • Q is equivalent to QS over databases conforming
    to S
  • QS is more efficient than Q

5
Query Rewriting
  • Optimizing Regular Path Expressions Using Graph
    Schemas, M.Fernandez and D.Suciu, Data
    Engineering, 98
  • Simplest setting
  • Regular path expression
  • Graph schemas

6
Example of Query Rewriting
  • Naive evaluation need to traverse entire graph
    (or tree)

Q //Department//Project
7
Example of Query Rewriting
  • Graph Schema

s1
S
other
Org
s2
other
Project
Member
s3
other
Org Department ? College ? School other
?Org ? ?Project ? ?Member
s4
other
8
Example of Query Rewriting
  • Schema says there can be at most one Department
    edge below, there can be at most one Project
    edge
  • QS can be evaluated more efficiently than Q
  • Why ?

Q //Department//Project
QS (other)/Department/(other)/Project
other ? Department ? College ? School ?
?Project ? ?Member
9
Example of Query Rewriting
  • How to construct QS systematically from Q and S ?
  • Step 1 build the automaton A for Q
  • Step 2 build the product automaton S x A
  • Step 3 QS expression of S x A

10
Example of Query Rewriting
true
true
Project
Dept
A
a3
a1
a2
S x A
false
false
other
other
S
s1
other
false
Dept
Org
Org
Org
other
other
false
false
s2
false
other
Project
false
Project
Project
other
other
false
false
Member
Member
s3
other
other
false
false
other
s4
other
QS (other)/Department/(other)/Project
11
Query Rewriting
  • Correctness
  • Proposition If the instance I conforms to S,
    then Q(I) QS(I)
  • That is, Q and QS are equivalent over databases
    conforming to S

12
Query Rewriting
  • Efficiency
  • Given query Q, instance I, define
  • cost(Q,I) w(I) w?prefix(Lang(Q))
  • Proposition If Q and Q are equivalent over all
    databases conforming to S, and if I conforms to
    S, then cost(QS,I) ? cost(Q,I)
  • Hence, QS is optimal (in a certain sense)

13
Query Rewriting
  • Query Optimization for Structured Documents Based
    on Knowledge on the Document Type Definition, K.
    Bohm, K. Gayer, K. Aberer, T. Özsu
  • More complex settings
  • Schema DTD
  • Query region algebrar (think Xpath)
  • Problem is more complex this works proposes some
    solution

14
Query Rewriting
  • Idea analyze DTD and extract 3 relations
  • Exclusivity. Element is E1 exclusively contained
    in E2 if every path from the root to E1 goes
    through E2
  • Xpath simplification E1ancestor-or-selfE2
    ? E1

15
Query Rewriting
  • Obligation E1 obligatorily contains E2 if it has
    a child of type E2
  • E1E2 ? E1

16
Query Rewriting
  • Entrance Location E is an entrance location for
    E1, E2 if every path from E1 to E2 goes through
    some E
  • E1ancestor-or-selfE2 ?E1ancestor-or-selfE
    ancestor-or-selfE2

17
Query Rewriting
  • Add these rules, plus variations, to a rule-based
    optimizer
  • HyperStorM a Structured Document Database
  • On top of VODAK an oo database system
  • Open question does this approach exploit all the
    information in a DTD/XML-Schema ? How can we
    exploit what is not used ?
About PowerShow.com