Approximate XML Query Answers - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Approximate XML Query Answers

Description:

Data Sets: XMark, DBLP, IMDB, SwissProt. Workload: 1000 random twig queries. Evaluation metrics: ... IMDB. 11. DBLP. Construction Time (min) Data Set. Conclusions ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 33
Provided by: tri586
Category:

less

Transcript and Presenter's Notes

Title: Approximate XML Query Answers


1
Approximate XML Query Answers
  • Alkis Polyzotis (UC Santa Cruz)
  • Minos Garofalakis (Bell Labs)
  • Yannis Ioannidis (U. of Athens, Hellas)

2
Motivation
  • XML de-facto standard for data exchange
  • Development of the XML Warehouse
  • Conflict between on-line and query execution
    cost
  • Increased query response times
  • Users might wait for un-interesting results

Q
Warehouse
R
3
Approximate Query Answers
  • Evaluate query over a concise data synopsis and
    obtain an approximation R of the true result
  • Use approximate result as timely feedback
  • User can assess the value of the query
  • Goal reduce number of evaluated queries

R
Q
Warehouse
R
4
Contributions
  • TreeSketch Synopses
  • Structural summaries for XML data
  • Approximate answers for complex twig queries
  • Summarization model ? Structural clustering of
    elements
  • Efficient processing and construction
  • Element Simulation Distance
  • Novel distance metric for XML data
  • Captures approximate similarity between two XML
    trees
  • Experimental Results
  • Accurate approximate answers for low space
    budgets
  • Low-error selectivity estimates
  • Efficient construction algorithm

5
Outline
  • Preliminaries
  • TreeSketches
  • Synopsis model
  • Computing approximate answers
  • Summary construction
  • Element Simulation Distance
  • Experimental Study
  • Conclusions

6
Data and Query Model
XML Document
7
Problem Definition
Approximate Nesting Tree
True Nesting Tree
  • Process twig query over a synopsis
  • Compute approximation of nesting tree

8
TreeSketch Model
9
Graph Synopsis
XML Document
Graph Synopsis
  • Synopsis node ? Set of elements of the same tag
  • Synopsis edge ? Document edge(s)

10
TreeSketch Synopsis
XML Document
TreeSketch
  • Augment graph-synopsis with edge counts
  • countu,v mean children in v per element in u

11
TreeSketch Synopsis
XML Document
TreeSketch
  • Is there a lossless synopsis?
  • What is the quality of a lossy synopsis?

12
Count Stability
XML Document
TreeSketch
1
2
1
1
1
1
1
  • (u,v) count-stable all elements in u have the
    same child-count in v

13
Count-Stable TreeSketch
XML Document
TreeSketch
R(1)
1
P(1)
1
1
S(1)
S(1)
2
2
F(2)
F(2)
1
1
1
C(4)
E(2)
  • A count-stable synopsis can recover the input
    tree
  • Efficient one-pass construction
  • Stable summary can be too large for practical use!

14
Lossy TreeSketch
XML Document
TreeSketch
15
TreeSketches and Clustering
  • TreeSketch ? Element clustering
  • All elements in a node are mapped to a centroid
  • Tight clusters ? Accurate synopsis
  • Synopsis quality ? Clustering error
  • Options Manhattan Distance, Squared Error,
  • Quality can be measured independent of a workload
  • Key for effective construction

16
Computing Approximate Answers
TreeSketch
Query
Approximate Nesting Tree
R
q0
//section
q1
.//caption
.//equation
q2
q3
  • Compute TreeSketch of approximate answer
  • Accuracy depends on quality of clustering

17
TreeSketch Construction
  • Given an XML tree T, build a TreeSketch of size B
  • Difficult clustering problem
  • Space dimensionality depends on the clustering
    itself
  • Construction based on bottom-up clustering
  • Compress perfect synopsis by merging clusters
  • Best merge determined by marginal gains
  • Heuristic to reduce number of candidate merges


Space Budget
Perfect
18
Element Simulation Distance
19
Error of Approximation
  • Error ? Distance between R and R
  • Popular metric Tree-edit distance
  • Min-cost sequence of operations that transform R
    to R
  • Measures syntactic differences between R and R
  • Not intuitive for approximate answers!

Different counts Similar Trait
Same counts Opposite Trait
T1
T
T2
20
Element Simulation Distance
  • Capture approximate similarity between R and R
  • u simulates v u and v have identical structure
  • ESD(u,v) degree of simulation between u,v
  • How well the structure of u matches the structure
    of v
  • Modeled as the distance between multi-sets
  • Efficient computation using perfect summaries

21
Experimental Results
22
Methodology
  • Data Sets XMark, DBLP, IMDB, SwissProt
  • Workload 1000 random twig queries
  • Evaluation metrics
  • Average ESD for approximate answers
  • Mean absolute relative error for selectivity
    estimation

23
Approximate Answers - IMDB
IMDB (102K Elements) Avg. Result Size 3,477
tuples
24
Selectivity Estimation - SwissProt
SwissProt (182K Elements) Avg. Result Size
104,592 tuples
25
Selectivity Estimation - ALL
26
Conclusions
  • Approximate query answering for XML databases
  • TreeSketch Synopses
  • Structural summaries for tree-structured XML
  • Approximate answers for twig-queries
  • Model Graph Synopsis Edge-counts
  • Efficient processing and construction
  • Element Simulation Distance
  • Capture approximate similarity between XML trees
  • Experimental Results
  • High accuracy for low space budgets
  • Efficient construction

27
Questions?
28
TreeSketch Model (2/2)
XML Document
TreeSketch
r
R
1
p1
P(1)
2
S(2)
s2
s3
1
1
F(2)
F(2)
f7
f9
f9
f5
1
1
1
C(4)
E(2)
c14
c17
e11
c12
e13
c17
  • Average number of children Edge count

29
XML
XML Document
r
p1
p paper s section c caption t title f
figure e equation
s2
s3
f7
f9
f9
f5
c14
c17
e11
c12
e13
c17
30
TreeSketch Synopsis
XML Document
TreeSketch
R(1)
1
P(1)
2
S(2)
F
2
F(4)
1
0.5
C(4)
E(2)
  • Augment graph-synopsis with edge counts
  • countu,v mean children in v per element in u

31
Depth-Guided Merging
  • Key observation Two elements have similar
    structure, if their children have similar
    structure
  • Bottom-up merging, based on depth
  • Depth distance from the leaves of the tree
  • Build a pool of candidate merges by increasing
    depth
  • Replenish the pool when it falls below a given
    threshold
  • Reduced construction time - Accurate synopses

32
Depth-Guided Merging
  • Observation Two elements have similar structure,
    if their children have similar structure
  • Heuristic If a merge of two clusters is good,
    then merges of the child clusters are likely to
    have been good as well
  • Bottom-up merging strategy
  • Savings in construction time - Accurate synopses
Write a Comment
User Comments (0)
About PowerShow.com