Approximate XML Query Answers - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Approximate XML Query Answers

Description:

Data Sets: XMark, DBLP, IMDB, SwissProt. Workload: 1000 random twig queries. Evaluation metrics: ... IMDB. 11. DBLP. Construction Time (min) Data Set. Conclusions ... – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 33

Provided by: tri586

Category:

more less

Transcript and Presenter's Notes

Title: Approximate XML Query Answers

1
Approximate XML Query Answers

Alkis Polyzotis (UC Santa Cruz)
Minos Garofalakis (Bell Labs)
Yannis Ioannidis (U. of Athens, Hellas)

2
Motivation

XML de-facto standard for data exchange
Development of the XML Warehouse
Conflict between on-line and query execution
cost
Increased query response times
Users might wait for un-interesting results

Q
Warehouse
R
3
Approximate Query Answers

Evaluate query over a concise data synopsis and
obtain an approximation R of the true result
Use approximate result as timely feedback
User can assess the value of the query
Goal reduce number of evaluated queries

R
Q
Warehouse
R
4
Contributions

TreeSketch Synopses
Structural summaries for XML data
Approximate answers for complex twig queries
Summarization model ? Structural clustering of
elements
Efficient processing and construction
Element Simulation Distance
Novel distance metric for XML data
Captures approximate similarity between two XML
trees
Experimental Results
Accurate approximate answers for low space
budgets
Low-error selectivity estimates
Efficient construction algorithm

5
Outline

Preliminaries
TreeSketches
Synopsis model
Computing approximate answers
Summary construction
Element Simulation Distance
Experimental Study
Conclusions

6
Data and Query Model
XML Document
7
Problem Definition
Approximate Nesting Tree
True Nesting Tree

Process twig query over a synopsis
Compute approximation of nesting tree

8
TreeSketch Model
9
Graph Synopsis
XML Document
Graph Synopsis

Synopsis node ? Set of elements of the same tag
Synopsis edge ? Document edge(s)

10
TreeSketch Synopsis
XML Document
TreeSketch

Augment graph-synopsis with edge counts
countu,v mean children in v per element in u

11
TreeSketch Synopsis
XML Document
TreeSketch

Is there a lossless synopsis?
What is the quality of a lossy synopsis?

12
Count Stability
XML Document
TreeSketch
1
2
1
1
1
1
1

(u,v) count-stable all elements in u have the
same child-count in v

13
Count-Stable TreeSketch
XML Document
TreeSketch
R(1)
1
P(1)
1
1
S(1)
S(1)
2
2
F(2)
F(2)
1
1
1
C(4)
E(2)

A count-stable synopsis can recover the input
tree
Efficient one-pass construction
Stable summary can be too large for practical use!

14
Lossy TreeSketch
XML Document
TreeSketch
15
TreeSketches and Clustering

TreeSketch ? Element clustering
All elements in a node are mapped to a centroid
Tight clusters ? Accurate synopsis
Synopsis quality ? Clustering error
Options Manhattan Distance, Squared Error,
Quality can be measured independent of a workload
Key for effective construction

16
Computing Approximate Answers
TreeSketch
Query
Approximate Nesting Tree
R
q0
//section
q1
.//caption
.//equation
q2
q3

Compute TreeSketch of approximate answer
Accuracy depends on quality of clustering

17
TreeSketch Construction

Given an XML tree T, build a TreeSketch of size B
Difficult clustering problem
Space dimensionality depends on the clustering
itself
Construction based on bottom-up clustering
Compress perfect synopsis by merging clusters
Best merge determined by marginal gains
Heuristic to reduce number of candidate merges

Space Budget
Perfect
18
Element Simulation Distance
19
Error of Approximation

Error ? Distance between R and R
Popular metric Tree-edit distance
Min-cost sequence of operations that transform R
to R
Measures syntactic differences between R and R
Not intuitive for approximate answers!

Different counts Similar Trait
Same counts Opposite Trait
T1
T
T2
20
Element Simulation Distance

Capture approximate similarity between R and R
u simulates v u and v have identical structure
ESD(u,v) degree of simulation between u,v
How well the structure of u matches the structure
of v
Modeled as the distance between multi-sets
Efficient computation using perfect summaries

21
Experimental Results
22
Methodology

Data Sets XMark, DBLP, IMDB, SwissProt
Workload 1000 random twig queries
Evaluation metrics
Average ESD for approximate answers
Mean absolute relative error for selectivity
estimation

23
Approximate Answers - IMDB
IMDB (102K Elements) Avg. Result Size 3,477
tuples
24
Selectivity Estimation - SwissProt
SwissProt (182K Elements) Avg. Result Size
104,592 tuples
25
Selectivity Estimation - ALL
26
Conclusions

Approximate query answering for XML databases
TreeSketch Synopses
Structural summaries for tree-structured XML
Approximate answers for twig-queries
Model Graph Synopsis Edge-counts
Efficient processing and construction
Element Simulation Distance
Capture approximate similarity between XML trees
Experimental Results
High accuracy for low space budgets
Efficient construction

27
Questions?
28
TreeSketch Model (2/2)
XML Document
TreeSketch
r
R
1
p1
P(1)
2
S(2)
s2
s3
1
1
F(2)
F(2)
f7
f9
f9
f5
1
1
1
C(4)
E(2)
c14
c17
e11
c12
e13
c17

Average number of children Edge count

29
XML
XML Document
r
p1
p paper s section c caption t title f
figure e equation
s2
s3
f7
f9
f9
f5
c14
c17
e11
c12
e13
c17
30
TreeSketch Synopsis
XML Document
TreeSketch
R(1)
1
P(1)
2
S(2)
F
2
F(4)
1
0.5
C(4)
E(2)

Augment graph-synopsis with edge counts
countu,v mean children in v per element in u

31
Depth-Guided Merging

Key observation Two elements have similar
structure, if their children have similar
structure
Bottom-up merging, based on depth
Depth distance from the leaves of the tree
Build a pool of candidate merges by increasing
depth
Replenish the pool when it falls below a given
threshold
Reduced construction time - Accurate synopses

32
Depth-Guided Merging

Observation Two elements have similar structure,
if their children have similar structure
Heuristic If a merge of two clusters is good,
then merges of the child clusters are likely to
have been good as well
Bottom-up merging strategy
Savings in construction time - Accurate synopses

Write a Comment

User Comments (0)