Title: Estimating Costs of Path Expression Evaluation in Distributed Object Databases
1Estimating Costs of Path Expression Evaluation in
Distributed Object Databases
- Gabriela Ruberg Fernanda Baião Marta
Mattoso - (gruberg, baiao, marta _at_cos.ufrj.br)
2Outline
- Motivation
- Path Expression Evaluation
- An Analytical Cost Model for Query Processing in
Distributed Object Databases - Selectivity Factor of Path Expressions
- Cost Model Analysis
- Related Work
- Conclusions and Ongoing Work
3 Motivation
- Modeling query processing performance is a hard
task in object DBMSs - partial participation of collections in
relationships - NxM relationships by the use of set attributes
- object sharing
- object clustering on disk
- pointer-based (navigation) algorithms
- binary and n-ary query operators ...
4Strategies for Path Expression Evaluation
- Basic dimensions
- Evaluation direction
- Forward or reverse
- Query operator (algorithm)
- Binary (join) or n-ary (naïve)
- Performance may significantly vary
- Selectivity factors of selection predicates
- Participation of collections in PE relationships
5 An Analytical Cost Model
- Four model pillars
- Object Data Model
- Data Storage Model
- Query Execution Model
- Distributed Database Design
6 An Analytical Cost Model
- Object Data Model
- Set attributes
- Object sharing
- Path expressions selectivity factors
- Data Storage Model
- Physical clustering of objects in disk
- Collection extension
- Relationship references
7 An Analytical Cost Model
- Query Execution Model
- Typical strategies and algorithms
- Small memory hypothesis
- IO reload overhead due to object sharing
- Database Distribution Design
- Several techniques for data fragmentation
- Object allocation
8 Cost Model Overview
SELECTIVITY ESTIMATES
SELECTIVITY ESTIMATES
DATA-VOLUME ESTIMATES
DB STATS
IO, CPU, and COMMUNICATION COSTS ESTIMATES
9 Selectivity of Path Expressions
- Estimating selectivity factors is an essential
task in cost-based optimization techniques - Combination of
- Evaluation direction
- Participation of object collections in PE
relationships - Selectivity of nested predicates
- Traditional prediction methods present high
computational costs and may lead to errors
10 Selectivity of Path Expressions
PEa.to-t (t) Measured 300000
- An example
- Traditional estimation methods
- Our method
37
11 Selectivity of Path Expressions
START
12 Fragmentation Effects
- Cost models for centralized query processing can
not be directly applied in a distributed context - Some data fragment may be previously disregarded
during the query execution - Different OO fragmentation techniques
- Horizontal Fragmentation
- Primary or Derived
- Vertical Fragmentation
13 Fragmentation Effects
14 Cost Model Analysis
- Experimental validation
- OO7 benchmark
- GOA ODBMS prototype at COPPE/UFRJ
- Cost model estimates are very close to
experimental results in all scenarios - Simulated scenarios have been obtained by varying
cost parameters
15 Experimental Validation
- IO analysis in OO7 centralized database
16 Experimental Validation
- Distributed database (horizontal fragmentation)
- Q1-Fa.part_of.document(datelt10/01/1990)
17 Simulated Scenario
- Object-sharing degree versus IO costs
18 Related Work
- Physical object clustering
- Gardarin, Gruser, and Tang, VLDB 1995
- Different PE evaluation strategies
- Gardarin, Gruser, and Tang, VLDB 1996
- Ozkan, Dogac, and Altinel, Jornal DB Management
1996 - Selectivity factors of path expressions
- Cho, Park, Whang, and Son, Information Systems
1996 - Bertino, and Foscoli, IEEE TKDE 1997
- Cho, Han, Hong, and Whang, CIKM 2000
- Cost metrics to support distributed object
database design - Fung, Karlapalem, and Li, DASFAA 1997
- Bellatreche, Karlapalem, and Basak, DEXA 1998
- Bellatreche, L., Karlapalem, K., Li, Q., ER 1998
- Ezeife, and Zheng, IDEAS 1999
19 Conclusions
- A realistic cost model is not obtained with a
simple combination of relevant issues into a
single model - These issues are strongly related to each other,
therefore they have to be remodeled.
20 Conclusions
- Traditional methods for the estimation of PE
selectivity are not suited for frequent queries - Low selectivity factors
- It is import to consider several alternative
strategies (algorithms, evaluation direction) for
path expression evaluation - Otherwise the cost model can be biased to bad
choices
21Contributions
- A realistic cost model that encompasses several
issues in object query processing - PE selectivity
- Participation of object collections in PE
relationships - NxM relationships
- Set attributes and object sharing
- Typical PE evaluation strategies
- IO overhead in small memory hypothesis
- Distributed database design
- Communication costs
yet fairly simple!
22 Ongoing Work
- Cost model extensions
- Semistructured and XML data
- Indexes
- Data replication
- New algorithms, etc.
- Path expression evaluation in distributed, XML
databases with NxM relationships - XLink and linkbases
23Estimating Costs of Path Expression Evaluation in
Distributed Object DatabasesGabriela Ruberg
Fernanda Baião Marta Mattoso(gruberg,
baiao, marta _at_cos.ufrj.br)
- Detailed version available at
- http//www.cos.ufrj.br/gruberg/ruberg2001_english
.pdf